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Abstract 

. Detection algorithms for multiple-input multiple-output (MIMO) wireless systems based on orthogonal 

' frequency-division multiplexing (OFDM) typically require the computation of a QR decomposition for each 

(N ■ 

of the data-carrying OFDM tones. The resulting computational complexity will, in general, be significant, 

o ■ 
O. 

«) . IEEE 802. 16e standard). Motivated by the fact that the channel matrices arising in MIMO-OFDM systems 
are highly oversampled polynomial matrices, we formulate interpolation-based QR decomposition algorithms. 



as the number of data-carrying tones ranges from 48 (as in the IEEE 802.11a/g standards) to 1728 (as in the 



An in-depth complexity analysis, based on a metric relevant for very large scale integration (VLSI) imple- 
mentations, shows that the proposed algorithms, for sufficiently high number of data-carrying tones and 
O . sufficiently small channel order, provably exhibit significantly smaller complexity than brute-force per-tone 
QR decomposition. 
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. 1. Introduction and Outline 

■ The use of orthogonal frequency-division multiplexing (OFDM) drastically reduces data detection com- 

;h ' plexity in wideband multiple-input multiple-output (MIMO) wireless systems by decoupling a frequency- 



selective fading MIMO channel into a set of fiat-fading MIMO channels. Nevertheless, MIMO-OFDM detec- 
tors still pose significant challenges in terms of computational complexity, as processing has to be performed 
on a per-tone basis with the number of data-carrying tones ranging from 48 (as in the IEEE 802.11a/g 
wireless local area network standards) to 1728 (as in the IEEE 802.16 wireless metropolitan area network 
standard). 
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Specifically, in the setting of coherent MIMO-OFDM detection, for which the receiver is assumed to have 
perfect channel knowledge, linear MIMO-OFDM detectors 1^ require matrix inversion, whereas successive 
cancelation receivers [21] and sphere decoders ^, 17| require QR decomposition, in all cases on each of the 
data-carrying OFDM tones. The corresponding computations, termed as preprocessing in the following, 
have to be performed at the rate of change of the channel which, depending on the propagation environ- 
ment, is typically much lower than the rate at which the transmission of actual data symbols takes place. 
Nevertheless, as payload data received during the preprocessing phase must be stored in a dedicated buffer, 
preprocessing represents a major bottleneck in terms of the size of this buffer and the resulting detection 
latency Q- 

In a very large scale integration (VLSI) implementation, the straightforward approach to reducing the 
preprocessing latency is to employ parallel processing over multiple matrix inversion or QR decomposition 
units, which, however, comes at the cost of increased silicon area. In the problem of reducing preprocess- 
ing complexity in linear MIMO-OFDM receivers is addressed on an algorithmic level by formulating efficient 
interpolation-based algorithms for matrix inversion that take the polynomial nature of the MIMO-OFDM 
channel matrix explicitly into account. Specifically, the algorithms proposed in exploit the fact that the 
channel matrices arising in MIMO-OFDM systems are polynomial matrices that are highly oversampled 
on the unit circle. The goal of the present paper is to devise computationally efficient interpolation-based 
algorithms for QR decomposition in MIMO-OFDM systems. Although throughout the paper we focus on 
QR decomposition in the context of coherent MIMO-OFDM detectors, our results also apply to transmit pre- 
coding schemes for MIMO-OFDM (under the assumption of perfect channel knowledge at the transmitter) 
requiring per-tone QR decomposition [2^. 

Contributions. Our contributions can be summarized as follows: 

• We present a new result on the QR decomposition of Laurent polynomial (LP) matrices, based on 
which interpolation-based algorithms for QR decomposition in MIMO-OFDM systems are formulated. 

• Using a computational complexity metric relevant for VLSI implementations, we demonstrate that, for 
a wide range of system parameters, the proposed interpolation-based algorithms exhibit significantly 
smaller complexity than brute-force per-tone QR decomposition. 

• We present different strategies for efficient LP interpolation that take the specific structure of the 
problem at hand into account and thereby enable (often significant) computational complexity savings 
of interpolation-based QR decomposition. 

• We provide a numerical analysis of the trade-off between the computational complexity of the inter- 
polation-based QR decomposition algorithms presented and the performance of corresponding MIMO- 
OFDM detectors. 



Outline of the paper. In Section O we present the mathematical preliminaries needed in the rest of the 
paper. In Section [3l we briefly review the use of QR decomposition in MIMO-OFDM receivers, and we 
formulate the problem statement. In Section [U we present our main technical result on the QR decom- 
position of LP matrices. This result is then used in Section [5] to formulate interpolation-based algorithms 
for QR decomposition of MIMO-OFDM channel matrices. Section [6] contains an in-depth computational 
complexity analysis of the proposed algorithms. In Section [71 we describe the appHcation of the new ap- 
proach to the QR decomposition of the augmented MIMO-OFDM channel matrices arising in the context 
of minimum mean-square error (MMSE) receivers. In Section [8l we discuss methods for LP interpolation 
that exploit the speciflc structure of the problem at hand and exhibit low VLSI implementation complexity. 
Section [9] contains numerical results on the computational complexity of the proposed interpolation-based 
QR decomposition algorithms along with a discussion of the trade-off between algorithm complexity and 
MIMO-OFDM receiver performance. We conclude in Section [TOl 

2. Mathematical Preliminaries 

2.1. Notation 

£PxM (jgnotes the set of complex- valued P x M matrices. U ^ {s £ C : \s\ = 1} indicates the unit 
circle. is the empty set. \A\ stands for the cardinality of the set A. mod is the modulo operator. All 
logarithms are to the base 2. E[-] denotes the expectation operator. CA/'(0, K) stands for the multivariate, 
circularly-symmetric complex Gaussian distribution with covariance matrix K. Throughout the paper, we 
use the following conventions. First, if k2 < ki, X^fcL/ci '^k = 0, regardless of ak. Second, sequences of 
integers of the form ki,ki + A, . . . , k2, with A > 0, simplify to the sequence fci, k2 if k2 = ki + A, to the 
single value fci if k2 = ki, and to the empty sequence if k2 < ki. 

A*, A"^, A-^, A''^, rank(A), and ran(A) denote the entry wise conjugate, the transpose, the conjugate 
transpose, the pseudoinverse, the rank, and the range space, respectively, of the matrix A. [A]p_,„ indicates 
the entry in the pth row and mth column of A. A^^'P^ and A„jj_,„2 stand for the submatrix given by the 
rows pi,pi -I- 1, . . . ,P2 of A and the submatrix given by the columns mi, mi + 1, . . . , m2 of A, respectively. 
Furthermore, we set AJJJ^p^^^ a (^A.^^ ,^^yi--P2 ^nd A'^^ „^^ ^ {Ami,m2)^- A P x M matrix A is said 
to be upper triangular if all entries below its main diagonal {[A]^: ^ : k — 1, 2, . . . , min(P, M)} are equal 
to zero. det(A) and adj(A) denote the determinant and the adjoint of a square matrix A, respectively. 
diag(ai,a2, . . . ,aM) indicates the M x M diagonal matrix with the scalar Om as its mth main diagonal 
element. Im stands for the M x M identity matrix, denotes the all-zeros matrix of appropriate size, 
and Wjvf is the M x M discrete Fourier transform matrix, given by [WM]p+i,q+i — er'^'^^P'^/^'^ (p, q — 
0, 1, . . . , M — 1). Finally, orthogonality and norm of complex-valued vectors ai, a2 are induced by the inner 
product af'^a2. 
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2.2. QR Decomposition 

Throughout this section, we consider a matrix A = [ai a2 • • • a^/] G C^x^^ with P > M, where a^ 
denotes the A:th column of A (fc = 1, 2, . . . , M). In the remainder of the paper, the term QR decomposition 
refers to the following: 

Definition 1. We call any factorization A = QR, for which the matrices Q e £Px-M and R G £Mxm 
satisfy the following conditions, a QR decomposition of A with QR factors Q and R: 

1. the nonzero columns of Q are orthonormal 

2. R is upper triangular with real- valued nonnegative entries on its main diagonal 

3. R = Q^A 

Practical algorithms for QR decomposition are either based on Gram-Schmidt (GS) orthonormalization 
or on unitary transformations (UT). We next briefly review both classes of algorithms. GS-based QR decom- 
position is summarized as follows. For k — 1,2, . . . , M , the fcth column of Q, denoted by qfe, is determined 

by 

fe-i 

Yk = afe - ^ qf afcQi (1) 



with 



Qfc = { (2) 
0, yk=0 

whereas the fcth row of R, denoted by r^, is given by 

rl = qf A. (3) 

UT-based QR decomposition of A is performed by left-multiplying A by the product @u ■ ■ ■ 020i oi P x P 
unitary matrices 0„, where the sequence of matrices ©i, ©2, . . . , @u and the parameter U are not unique 
and are chosen such that the P x AI matrix &u ■ &2&1A is upper triangular with nonnegative real- 
valued entries on its main diagonal. The matrices ©„ are typically either Givens rotation matrices Q| or 
Householder reflection matrices Ql- With R = (©[/••• ©2©iA)i'*^ and Q = ((©[/••• ©2©i)^)i,j\/, we 
obtain that Q^A = R and, since &(j ■ ■ @2&i is unitary, that Q^Q = Therefore, Q and R are 

QR factors of A. For P > M, we note that the P x (P- M) matrix = ((©[/ • • • ©2©i)^)m+i,p satisfles 
(Q-'")''^Q^ = Ip-Af and Q^Q^ = 0. In practice, UT-based QR decomposition of A can be performed as 
follows [S|,13|]- A P X M matrix X and a P x P matrix Y are initialized as X ^ A and Y ^ Ip, respectively, 
and the counter u is set to zero. Then, u is incremented by one, and X and Y are updated according to 
X ^ ©mX and Y ^ ©uY, for an appropriately chosen matrix ©„. This update step is repeated until X 
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becomes upper-triangular with nonnegative real-valued entries on its main diagonal. The parameter U is 
obtained as the final value of the counter u, and the final values of X and Y are 



R 








, Y = 







_ (Q^)^ _ 



Since the uth update step can be represented as [X Y] ^ 0n[X Y], we can describe UT-based QR de- 
composition of A by means of the formal relation 



&U &2&i[A Ip] = 



R 

(Q^)« 



(4) 



which, from now on, will be called standard form of UT-based QR decomposition, and will be needed in 
Section [7!T] in the context of regularized QR decomposition. The standard form ^ shows that for P > M, 
UT-based QR decomposition yields the (P — M) x P matrix (Q^)^ as a by-product. For P = M, the 
right-hand side (RHS) of (g]) reduces to [R ]. 

We note that since yi = is equivalent to ai = and yfe = is equivalent to rank(Ai^fe_i) = rank(Ai^fc) 
{k — 2,3, ... , M) |9], GS-based QR decomposition sets M — rank(A) columns of Q and the corresponding 
M ~ rank(A) rows of R to zero. In contrast, UT-based QR decomposition yields a matrix Q such that 
Q-^Q = Ia/, regardless of the value of rank(A), and sets M — rank(A) entries on the main diagonal of R to 
zero Hence, for rank(A) < Af, different QR decomposition algorithms will in general produce different 
QR factors. 

Proposition 2. //rank(A) —M, Conditions[I\ and[^ of Definition[I\ simplify, respectively, to 

1. Q^Q - Im 

2. R is upper triangular with [RJfc.fc > 0,k — 1,2, . . . , M 

whereas Condition [2| is redundant. Moreover, A has unique QR factors. 

Proof. Since A — QR implies rank(A) < min{rank(Q), rank(R)}, it follows from rank(A) — M that 
rank(Q) = rank(R) = M. Now, rank(Q) = M implies that the P x M matrix Q can not contain all- 
zero columns, and hence Condition [1] is equivalent to Q^Q — Im- Moreover, rank(R) — M implies 
det(R) ^ and, since R is upper triangular, we have det(R) — Y[kLi[^]k,k- Hence, Condition [2] becomes 



k.k 



> 0,k = 1,2, . . . , M. Condition [3] is redundant since A — QR, together with Q^Q 



Q'^A — R. The uniqueness of Q and R is proven in [gj]. Sec. 2.6. 



Im, implies 
□ 



We conclude by noting that for full-rank A, the uniqueness of Q and R implies that A = QR can be 
called the QR decomposition of A with the QR factors Q and R. 
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2.3. Laurent Polynomials and Interpolation 

In the remainder of the paper, the term interpolation indicates LP interpolation, as presented in this 
section. Interpolation is a central component of the algorithms for efficient QR decomposition of polynomial 
matrices presented in Sections [5] and [7l In the following, we review basic results on interpolation and 
establish the corresponding notation. In Section [8l we will present various strategies for computationally 
efficient interpolation tailored to the problem at hand. 

Definition 3. Given a matrix-valued function A : U i^pxm ^^^^^ integers Vi,V2 > 0, the notation 
A(s) ^ {Vi, V2) indicates that there exist coefficient matrices e C''^^*^, v = —Vi, —V\ + 1, . . . , V2, such 
that 



If A(s) ^ (Vi, V2), then A(s) is a Laurent polynomial (LP) matrix with maximum degree Vi +V2. 

Before discussing interpolation, we briefly list the following statements which follow directly from Def- 
inition O First, A(s) - (Vi,V2) implies A(s) - {¥{,¥2) for any V{ > Vi.V^ > V2. Moreover, since 
for .s £ U we have s* — A(s) ^ (^1,1^2) implies A^(s) ^ (1^2,^1). Finally, given LP matri- 

ces Ai(s) ^ {ViijVu) and A2(s) ~ (V2i,V22), if Ai(s) and A2(s) have the same dimensions, then 
(Ai(s) + A2(s)) ~ (max(yii, V21), max(V^i2, V22)), whereas if the dimensions of Ai(s) and A2(s) are such 
that the matrix product Ai(s)A2(s) is deflned, then Ai(s)A2(s) ^ (Vu + V21, V12 + ^22). 

In the remainder of this section, we review basic results on interpolation by considering the LP a{s) ^ 
{Vi, V2) with maximum degree V ^ Vi + V2. The following results can be directly extended to the interpo- 
lation of LP matrices through entrywise application. Borrowing terminology from signal analysis, we call 
the value of a{s) at a given point sq £U the sample a(so). 

Definition 4. Interpolation of the LP a{s) ^ {Vi, V2) from the set B = {60, bi, . . . , 6_b-i} C U, containing B 
distinct base points^ to the set T = {tg, ti, . . . , tT-i} C lA, containing T distinct target points, is the process 
of obtaining the samples a(io), ■ ■ • , a(iT-i) from the samples a(6o), • ■ • , with knowledge 

of Vi and V2, but without explicit knowledge of the coefficients a_Vi,a_Vi+i, . . . ,av2 that determine a{s) 
according to |[5]) . 



In the following, we assume that B > V + 1. By defining the vectors a = [a-Vi o-Vi+i • • • ay^ 



— V 



s€U. 



(5) 



v=-Vi 




i(iT-i)]^, we note that ag = Ba, with the 




(6) 
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and slt — Ta, with the T x {V + 1) target point matrix 



i-Vi ,Vi-l 

''0 '-0 

.Vi ,Vi-l 

'-1 '-1 



Vi ,Vi-l 
''T-1 "-T-l 



'•I 

'•T-1 

lVi lyi lyi 



(7) 



Now, B can be written as B = DgVe, where Dg = diag(&Q\6^\ . . . ,^^-1) the i? x (1/ + 1) 

Vandermonde matrix 



1 67 



1 



,-(Vi+y2) 







Since the base points 60, 61, ... , 65-1 are distinct, has full rank [3]. Hence, rank(Vg) = V + 1, which, 
together with the fact that is nonsingular, implies that rank(B) = V + 1. Therefore, the coefficient 
vector a is uniquely determined by the B samples of a(s) at the base points 60, 61, ... , 6b-i according to 
a = B^ag, and interpolation of a{s) from ;B to T can be performed by computing 



ar 



TB^ 



ag. 



(8) 



In the remainder of the paper, we call the T x B matrix TB^ the interpolation matrix. 

We conclude this section by noting that in the special case Vi = V2, we have B — B*E and T = T*E, 
where the (V+l) x (V+l) matrix E is obtained by flipping ly+i upside down. Since the operation of taking 
the pseudoinverse commutes with entrywise conjugation, it follows that B^ — E(B'i')* and, as a consequence 
of E^ = ly+i, we obtain TB^ = (TB'I')* i.e., the interpolation matrix is real-valued. 



3. Problem Statement 

3.1. MIMO-OFDM System Model 

We consider a MIMO system [3| with Mt transmit and AI^ receive antennas. Throughout the paper, 
we focus on the case M^j > Mt- The matrix- valued impulse response of the frequency-selective MIMO 
channel is given by the taps H; e £MrxMt (; — Q, 1, . . . , L) with the corresponding matrix- valued transfer 
function 

which satisfies H(s) ^ (0, L). In a MIMO-OFDM system with N OFDM tones and a cyclic prefix of length 
-^cp > L samples, the equivalent input-output relation for the nth tone is given by 

d„ = H(s„)c„ + w„, n = 0, 1, . . . , iV - 1 
7 
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with the transmit signal vector c„ = [c„_i c„.2 • • • Cn^Mr]'^, the receive signal vector d„ = [dn,i <in,2 
the additive noise vector w„, and s„ = gj27m/jv_ jjgj.g^ ^^^^ stands for the complex- valued data symbol, 
taken from a finite constellation O, transmitted by the mth antenna on the nth tone and c?„^m is the signal 
observed at the mth receive antenna on the nth tone. For n = 0, 1, . . . , iV — 1, we assume that c„ contains 
statistically independent entries and satisfies E[c„] ~ and E[c^c„] = 1. Again for n = 0, 1, . . . , iV — 1, we 
assume that w„ is statistically independent of c„ and contains entries that are independent and identically 
distributed (i.i.d.) as CA/'(0,cr^), where cr^ denotes the noise variance and is assumed to be known at the 
receiver. 

In practice, N is typically chosen to be a power of two in order to allow for efficient OFDM processing 
based on the Fast Fourier Transform (FFT). Moreover, a small subset of the N tones is typically set aside for 
pilot symbols and virtual tones at the frequency band edges, which help to reduce out-of-band interference 
and relax the pulse-shaping filter requirements. We collect the indices corresponding to the D tones carrying 
payload data into the set I? C {0, 1, . . . , — 1}. Typical OFDM systems have D > 3icp- 

3.2. QR Decomposition in MIMO-OFDM Detectors 

Widely used al gori thms for coherent detection in MIMO-OFDM systems include successive cancela- 
tion (SC) detectors both zero-forcing (ZF) and MMSE 



detectors botn zero-torcmg {LH ) ana MMSH; and sphere decoders, both in the ori gina l 

formulation requiring ZF-based preprocessing, as well as in the MMSE-based form proposed in |l6l |. 

These detection algorithms require QR decomposition in the preprocessing step, or, more specifically, com- 
putation of matrices Q(s„) and R(s„), for all n G V, defined as follows. In the ZF case, Q(s„) and R(s„) 
are QR factors of H(s„), whereas in the MMSE case, Q(s„) and R(s„) are obtained as follows: Q(s„)R(s„) 
is the unique QR decomposition of the full-rank, {Mr + Mt) x Mt MMSE- augmented channel matrix 



H(s„) 



H(s„) 



(9) 



and Q(s„) is given by Q^'^^(s„). Taking the first Mr rows on both sides of the equation H(s„) = 
Q(s„)R(s„) yields the factorization H(s„) = Q(s„)R(s„), which is unique because of the uniqueness of 
Q(s„) and R(s„), and which we call the MMSE-QR decomposition of H(s„) with the MMSE-QR factors 
Q(s„) and R(s„). 

In the following, we briefiy describe how Q(s„) and R(s„), either derived as QR decomposition or 
as MMSE-QR decomposition of H(s„), are used in the detection algorithms listed above. SC detectors 
essentially solve the linear system of equations Q^(s„)d„ — R(s„)c„ by back-substitution (with rounding 
of the intermediate results to elements of O [l^) to obtain c„ € qMt^ Sphere decoders exploit the upper 
triangularity of R(s„) to find the symbol vector c„ g qMt ^j^^^ minimizes ||Q^(s„)d„ — R(s„)c„|p through 



an efficient tree search 



3. 



3.3. Problem Statement 

We assume that the MIMO-OFDM receiver has perfect knowledge of the samples H(s„) for 71 e £ C 
{0, 1, . . . , — 1}, with \£\ > L + 1, from which H(s„) can be obtained at any data-carrying tone n G V 
through interpolation of H(s) ~ (0, L). We note that interpolation of H(s) is not necessary if I? C £. We 
next formulate the problem statement by focusing on ZF-based detectors, which require QR decomposition 
of the MIMO-OFDM channel matrices H(s„). The problem statement for the MMSE case is analogous with 
QR decomposition replaced by MMSE-QR decomposition. 

The MIMO-OFDM receiver needs to compute QR factors Q(s„) and R(s„) of H(s„) for all data-carrying 
tones n (zV. A straightforward approach to solving this problem consists of first interpolating H(s) to ob- 
tain H(s„) at the tones n £ V and then performing QR decomposition on a per-tone basis. This method 
will henceforth be called brute-force per-tone QR decomposition. The interpolation-based QR decomposition 
algorithms presented in this paper are motivated by the following observations. First, performing QR decom- 
position on an M X M matrix requires 0{A'P) arithmetic operations {g], whereas the number of arithmetic 
operations involved in computing one sample of an M x M LP matrix by interpolation is proportional to 
the number of matrix entries M^, as interpolation of an LP matrix is performed entrywise. This comparison 
suggests that we may obtain fundamental savings in computational complexity by replacing QR decompo- 
sition by interpolation. Second, consider a fiat-fading channel, so that L — and hence H(s„) — Hq for all 
n — 0,1, . . . , N — 1. In this case, a single QR decomposition Hq = QR yields QR factors of H(s„) for all 
data-carrying tones n £ T>. A question that now arises naturally is whether for L > QR factors Q(s„) 
and R(s„), n e P, can be obtained from a smaller set of QR factors through interpolation. We will see that 
the answer is in the affirmative and will, moreover, demonstrate that interpolation-based QR decomposition 
algorithms can yield significant computational complexity savings over brute-force per-tone QR decompo- 
sition for a wide range of values of the parameters Mt, Mr, L, N, and D, which will be referred to as 
the system parameters throughout the paper. The key to formulating interpolation-based algorithms and 
realizing these complexity savings is a result on QR decomposition of LP matrices formalized in Theorem [9] 
in the next section. 



4. QR Decomposition through Interpolation 

4.1. Additional Properties of QR Decomposition 

We next set the stage for the formulation of our main technical result by presenting additional properties 
of QR decomposition of a matrix A e C^^^', with P > M, that are directly implied by Definition [TJ 

Proposition 5. Let A — QR be a QR decomposition of A. Then, for a given k G {1, 2, . . . , M} , Ai^k = 
Qi^k^i'k is a QR decomposition of Ai^k- 
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Proof. From A = QR it follows that Ai ^ = (QR)ijt = Qi.fcR^'Jj + Qfc+i.AfR-i V'^^, which simplifies to 
Ai,fc = Qi^fcRj'^, since the upper triangularity of R implies R^"^^'^^ = 0. Qi,fe and R}'^ satisfy Conditions[T] 
and[2]of Definition [1] since all columns of Qi,fc are also columns of Q and since 'R\''^ is a principal submatrix 
of R, respectively. Finally, R = Q^A implies R^'ji = {Q^ A)^''^ — Q^^Ai ^ and hence Condition [3] of 
Definition [1] is satisfied. □ 



Proposition 6. Let A = QR be a QR decomposition of A. Then, for M > 1 and for a given k e 
{2,3,..., M}, AkM ~ Qi,fc-iR^'5(,7^ = Qfe.AfRfe^M °- decomposition of Ak,M - Qi^fc-iR^^M ^ • 

Proof A = Qi,fe_iRi'''"i + Qfe^ji/R'''*"^ implies Ak,M = Qi,fc-iRfc.M ^ + ^k,M'^k',M hence Ak,M - 
Qi,fc_iR^'5vf ^ — Qfe,MRfcM- Qk,M and R^'jJ^ satisfy Conditions [1] and [2] of Definition [1] since all columns 
of Qfe,M are also columns of Q and since R^'jJ^ a principal submatrix of R, respectively. Moreover, 
R = Q'^A implies R^'m ~ (Q^'^)fe'M = Qk.M-^k.M ■ Using Q^^Qi_/j_i = 0, which follows from the fact 
that the nonzero columns of Q are orthonormal, we can write R^'^ = j^jAk.M — Qfc^MQi.fc-i-f^fcM^^ ~ 
Qj^ j^j{Ak,M — Qi.k-i'^k''M^)- Hence, Condition [3] of Definition [1] is satisfied. □ 

In order to characterize QR decomposition of A in the general case rank(A) < M, we introduce the 
following concept. 

Definition 7. The ordered column rank of A is the number 



K 



0, rank(Ai,i) = 

maxjfc e {1, 2, . . . , A/} : rank(Ai^fc) = fc}, else. 



For later use, we note that i^T = is equivalent to ai — 0, and that if < M is equivalent to A being 
rank-deficient. 

Proposition 8. QR factors Q and H of a matrix A of ordered column rank K > satisfy the following 
properties: 

2. [R]k,k>0 fork ^1,2,..., K 
3- Qi.K and 'R}'^ are unique 

4. ran(Qi,fe) = ran(Ai,fc) for k = 1,2, K 

5. ifK<M, [R]x+i,K+i = 

Proof. Since Qi,i<- and RJ'^ are QR factors of Ai^^f , as stated in Proposition [5l and since rank(Ai,A') = K, 

Properties [1] and O as well as the uniqueness of Qi.k stated in Property^ are obtained directly by applying 

Proposition [2] to the full-rank matrix Ai^k- The uniqueness of R^'^ stated in Property [3] is impHed by 

the uniqueness of Qi.a- and by R^'^ = Qf^^A, which follows from Condition [3] of Definition [TJ For 
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k — 1,2, . . . , K , ran(Qi fc) = ran(Ai,fc) is a trivial consequence of Ai ^ = Qi^^R^J'^ and of rank(Rj'^) — k, 
which follows from the fact that R}'^ is upper triangular with nonzero entries on its main diagonal. This 
proves Property m If K < M, Condition [3] of Definition [1] implies [R]k+i,x+i = (Ik+i^k+i- If Qk+i = 0, 
[R]k+i,a'+i = follows trivially. If c[k+i 0, Condition [1] of Definition [T] implies that qK+i is orthogonal 
to ran(Qi^i<-), whereas the definition of K implies that slk+i G i'an(Ai_A')- Since ran(Qi j^) = ran(Ai^i<:), 
we obtain q^^-^^aK+i = [H]k+i.k+i = 0, which proves Property [H □ 

We emphasize that for K > 0, the uniqueness of Qi,k and R^'^ has two significant consequences. First, 
the GS orthonormalization procedure (IT])-(l3l), evaluated for k = 1,2, . . . , K , determines the submatrices 
Qi,_fs- and R^'^ of the matrices Q and R produced by any QR decomposition algorithm. Second, the 
nonuniqueness of Q and R in the case of rank-deficient A, demonstrated in Section [2?2l is restricted to the 
submatrices Q_ft:+i,j\f and 

Finally, we note that Property[5]of Proposition [8] is valid for the case if = as well. In fact, Condition[3] 
of Definition [1] implies [R]i,i = qf^ai. Since K = implies ai = 0, we immediately obtain [R]i,i = 0. 

4-2. QR Decomposition of an LP Matrix 

In the remainder of SectionjH we consider a P x M LP matrix A(s) ^ (Vi, V2), s with P > M, and 
QR factors Q(s) and R(s) of A(s). Despite A(s) being an LP matrix, Q(s) and R(s) will, in general, not be 
LP matrices. To see this, consider the case where rank(A(s)) = M for all s £U. It follows from the results 
in Sections 12.21 and ITTI that, in this case, Q(s) and R(s) are unique and determined through H])-©. The 
division and the square root operation in ([2]), in general, prevent Q(s), and hence also R(s) — Q-^(s)A(s), 
from being LP matrices. Nevertheless, in this section we will show that there exists a mapping M that 
transforms Q(s) and R(s) into corresponding LP matrices Q(s) and R(s). The mapping Ai constitutes the 
basis for the formulation of interpolation-based QR decomposition algorithms for MIMO-OFDM systems. 

In the following, we consider QR factors of A(sq) for a given sq G U. In order to keep the notation 
compact, we omit the dependence of all involved quantities on sq. We start by defining the auxiliary 
variables as 

Afe^Afe_i[R]2fc, k = l,2,...,M (10) 
with Aq = 1. Next, we introduce the vectors 

qfe^Afe_i[R],.,,qfc, k = l,2,...,M (11) 
fI^^Afe_i[R],,ri^, k = l,2,...,M (12) 

and define the mapping M. : (Q, R) ^ (Q, R) by Q = [qi q2 • • • qA/] and R = [f 1 f 2 • • • f 

Now, we consider the ordered column rank K of A, and note that Property [2] in Proposition [8] implies 

that, \{ K > Q, Afe_i [R]^ t > for k — 1,2, . . . , K , as seen by unfolding the recursion in (fTOl) . Hence, for 
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K > Q and k = 1,2, . . . , K , we can compute and rj from q/c and f^, respectively, according to 

qfe = (Afe_i [R]fc J-lqfc (13) 

Jfe.fe 

where Ak-i [R]^, is obtained from the entries on the main diagonal of R as 



(A,._i[R], ,)-ifi^ (14) 



A,_i [R],,, = <; \ ^ (15) 

[R]fe_i,fc_i[R]fc,fc, k^2,3,...,K. 

If fsT = M, i.e., for full-rank A, we have A^-i [R]j, ^ 7^ for all k = 1,2,..., M, and the mapping is 
invertible. In the case K < M, Property [5] in Proposition [8] states that [R]i<-+i i^+i = 0, which combined 
with ((lOll-IIlD implies that Afe = 0, qfc = 0, and f J = for fc = X + l,K + 2,...,M. Hence, the 
mapping A4 is not invertible for K < M, since the information contained in Qk+i,m and R^+i^^^ can 
not be extracted from Q_fs-+i,j\/ = and R-f'^+i.^^ = 0. Nevertheless, we can recover Q/^+i,j\/ and R-^^+i.^^ 
as follows. For < K < M, setting k = K + \ m Proposition [6] shows that QA'+i,Af and R^^}'||^ can 
be obtained by QR decomposition of Ak+i,m — Qi^A'R^+i m- Then, R^+i^^^ is obtained as R^+i'*^ — 
[Rf^^'^^ R^ti'M ] ^i*^ = because of the upper triangularity of R. For K = 0, since Q and R 

are all-zero matrices, Qa'+i.a/ = Q and R^^J'*^ = R must be obtained by performing QR decomposition 



on A. In the remainder of the paper, we denote by inverse mapping M-^ : (Q,R) ^ (Q,R) the procedurtl 
formulated in the following steps: 

1. If if > 0, for k ~ 1,2, . . . , K , compute the scaling factor (Afc_i [RJj.^,)^^ using (fT5|) and scale q^ 
and £"1" according to lfT3)) and (fHl) . respectively. 

2. If < K < M, compute Qa+i.m and R^^J'jJ^^ by performing QR decomposition on Aa-+i^m — 
Qi,k'Rk+i,m, and construct RK+im ^ R^+} j{ ]• 

3. If if = 0, compute Q and R by performing QR decomposition on A. 

We note that the nonuniqueness of QR decomposition in the case K < M has the following consequence. 
Given QR factors Qi and Ri of A, the application of the mapping Ai to (Qi,Ri) followed by application 
of the inverse mapping ^A~^ yields matrices Q2 and R2 that may not be equal to Qi and Ri, respectively. 
However, Q2 and R2 are QR factors of A in the sense of Definition [TJ 

We are now ready to present the main technical result of this paper. This result paves the way for the 
formulation of interpolation-based QR decomposition algorithms. 

Theorem 9. Given A : U ^ £Pxm ^-^.f^ p > ^^^f^ ^f^^f. ^(^-j _ (^1,^2) with maximum degree 
V = Vi +V2. The functions Afe(s), qfe(s), and f^(s), obtained by applying the mapping A4 as in II10\) - [T^) 
to QR factors Q(s) and R(s) 0/ A(s) for all s eU, satisfy the following properties: 



^Note that for K < M, the inverse mapping M- ^ requires explicit knowledge of Kk+i,m- 

12 



1. Ak{s)^{kV,kV) 

2. qfc(s) - ((fc - l)V + Vi,{k- l)V + V2) 

3. ilis)^{kV,kV). 



We emphasize that Theorem [9] appUes to any QR factors satisfying Definition [T] and is therefore not 
affected by the nonuniqueness of QR decomposition arising in the rank-deficient case. 

Before proceeding to the proof, we note that Theorem [9] impUes that the maximum degrees of the 
LP matrices Q(s) and R(s) are (2Af — 1)V and 2MV, respectively. We can therefore conclude that 2MV +1 
base points are enough for interpolation of both Q(s) and R(s). We mention that the results presented in ^, 
in the context of narrowband MIMO systems, involving a QR decomposition algorithm that avoids divisions 
and square root operations, can be applied to the problem at hand as well. This leads to an alternative 
mapping of Q(s) and R(s) to LP matrices with maximum degrees significantly higher than 2MV. 

4.3. Proof of Theorem[E 

The proof consists of three steps, summarized as follows. In Step 1, we focus on a given sq ^ U and 
aim at writing Afc(so), (ik{so), and f J(so) as functions of A(so) for all {K{so), k) £ JC = {0, 1, . . . , A/} x 
{1,2,..., M}, where K{so) denotes the ordered column rank of A(so)- Step 1 is split into Steps la and lb, 
in which the two disjoint subsets /Ci = {{K', k') e IC : < K' < M, 1 < k' < K'} and IC2 = {{K', k') e IC : 
< K' < M, K' + 1 < k' < M} (with /Ci U /C2 = 1^) are considered, respectively. In Step la, we note that 
for {K{so),k) e /Ci, Qi,if (so) (sq) and R^'^(''°^ (sq) are unique and can be obtained by evaluating ((I])-([3l) 
for fc = 1,2, . . . ,K{so)- By unfolding the recursions in ([I])-© and in lfT0)) - (fT2l) . we write Afc(so), qfe(so), 
and f|"(so) as functions of A{sq) for {K{so), k) E JCi. In Step lb, we show that the expressions for Afe(so), 
qfe(so), and f^(so), derived in Step la for {K{so), k) G /Ci, are also valid for {K{sq), k) E IC2 and hence, as 
a consequence of /Ci U /C2 ~ JC, for all {K{so),k) £ IC. In Step 2, we note that the derivations in Step 1 
carry over to all sq e U, and generalize the expressions obtained in Step 1 to expressions for Afc(s), qfe(s), 
and f^(s) that hold for k ~ 1, 2, . . . , A/ and for all s E U. Making use of A(s) ^ (Vi, V2), in Step 3 it is 
finally shown that Afc(s), qfc(s), and f|^(s) satisfy Properties [l]-[3] in the statement of Theorem [9l 

Step la. Throughout Steps la and lb, in order to simplify the notation, we drop the dependence of all 
quantities on sq. In Step la, we assume that {K,k) £ ICi and, unless stated otherwise, all equations and 
statements involving k are valid for all k = 1,2, . . . , K . 

We start by listing preparatory results. We recall from Section HTTI that the submatrices Qi.k and R^^^ are 
unique and that, consequently, q^ and are determined by l(I|)-ll3l). From qfe 7^ 0, implied by Property [1] 
in Proposition [HI and from ^ we deduce that 7^ 0. Then, from llj and ([2]) we obtain 




(16) 
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as q^qi for i ~ 1,2, . . . , k — 1. Consequently, we can write [R]fc.fc, using ^ and ([3]), as 



[R-]fc,fc = (Ik^k — 

thus implying [R]k,kflk = Vk and hence, by ([TT]), 



yfafe 



= yvkYk 



(17) 



q/c = Afc_iyfc. (18) 
Furthermore, using ifTOl) and (fTZ)) . we can write = Afe-iy^yfc or alternatively, in recursion-free form, 

A; 

Afe = nyfy- (19) 

Next, we note that ^ implies 



yfc = afe + ^ a. 



(k). 



(20) 



with unique coefficients a[''\i = 1, 2, . . . , fc — 1, since yi = ai and since for A: > 1, we have rank(Ai^fc_i) — 
k — 1 and, as stated in Property |4] of Proposition [8l ran(Qi_fe_i) = ran(Ai fc_i). Next, we consider the 
relation between {ai,a2, . . . ,ak} and {yi,y2, ■ • ■ ,yk}- Inserting ([2]) into ^ yields 



yr afe 

Yk = ak- — fr~y« 



Hence, using ifT6|) . we obtain 



fc'-i ^ 
yf afc' 

afc' = Yk' + H y' 

i=i y^ y* 



fc' 

We next note that (1211) can be rewritten, for k' — 1,2, ... ,k, in vector-matrix form as 



a2 ■ • ■ afc 



J = [yi 



y2 



yfc]Vfe 



(21) 



(22) 



with the k X k matrix 



y fai yfa2 



y; yi 




yi yi 

yf a2 

yj^ya 







yf yi 

yf afc 

yfy2 



yt afc 

yk 
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satisfying det(Vfc) = 1 because of yfc 7^ and of (fT6|) . Next, we can write V^- as = Dj. ^U^ with the 
k X k nonsingular matrices = diag(yf yi, yf y2, . . • ,yf y^) and 

f ai yf a2 • • • yf 
yfaa ••• yfafc 



(23) 





We next express as a function of Ai^^. From lfT6)l . ifTO)) . and l(23)) . we obtain 



Afe ^ny^""^' =^^^(''^fe)■ 



Furthermore, (0), I®, and HZI) imply 



(24) 



yfe'a, = yy^yfc'Qfc'aj = [R]fe',fc'[R-]fc'.* 

which evaluates to zero for 1 < i < fc' < /c because of the upper triangularity of R. Hence, can be 
written as 



By combining l(2i)) and l(25]) . we obtain 



y" ai yf a2 • • • y^ a^ 



y2 ai y2 a2 ••• y2 ak 



yf ai yf a2 • ■ 



yfafc 



(25) 



Afe = det(Ufe) = dot 



det(AffcAi,fc) 



yf Ai,fc 




af Ai^fc 


yf Ai,fc 




af Ai^fe 


= dct 




. yf Ai^fc _ 




. af Ai^fe _ 



(26) 



(27) 



where the third equality in (|26l) can be shown by induction as follows. We start by noting that yi = ai, 
which implies that in the first row of Ufc, yi can be replaced by ai. For fc' > 1, assuming that we have 
already replaced yi, y2, • ■ ■ , Yfe'-i by ai, a2, . . . , afc/„i, respectively, we can replace yk' by a^/ since, as a 
consequence of l(20)) . the k'th row of U^: can be written as 



yf Ai,fc = a^,Ai,fc 



k'-l 

E( 



T(afAi,.)- 



Hence, replacing yf Ai^^ by af Ai^^ amounts to subtracting a linear combination of the first fc' — 1 rows 

PI 

of Ufc from the fc'th row of Ufc. This operation does not affect the value of det(Ufc) [§]. 
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Similarly to what we have done for A^, we will next show that can be expressed in terms of Ai^^ 
only. We start by noting that, since is nonsingular, we can rewrite (l22l) as 



.yi y2 ■•• Yfc. 



ai a2 



(28) 



Next, from = D^, Ufe we obtain that 



dct(Ufc) 



and hence, by (p4l) . that 



Vr^ = 



A, 



^1,1 ^2,1 ' ' ■ ^ k,l 
^2,2 ' ' ' ^ k.2 







• r 



(fc) 



(29) 



adj(Ut) 

where adj(Ufc) is upper triangular since Ufc is upper triangular, and rl'^m denotes the cofactor of Ufc relative 
to the matrix entry [Ufc]„^„i {n — 1,2, k] m — n,n + 1, k) Note that in order to handle the case 
k = 1 correctly, for which adj(Ui) = r[]l, det(Ui) = Ui = Ai, and Uj^^ = 1/Ai, we define r[]l = 1. 
From dSHl) and ^ it follows that 

k 

Yk 



A 



k 

H \ — ^ 



{k). 



Afe^i 



and therefore, by (fTSl) . we get 



qfc 



En' 



(30) 



which evaluates to qi = ai for k — 1. Next, for fc > 1 we denote by the matrix obtained by removing 

(k) 

the ith column of Ai_/j, and we express ^ as a function of ai, a2, . . . , a^ according to 



k.i 



(-l)'=+Met 



y2-^l,k\i 



yf_iAi,fcv 

= (-l)'=+Mct(Af,_iAi,,v) 
where the last equality is derived analogously to l(26l) and (l27)) . Thus, l(30)) can be written as 

f 

SLk, k = 1 

Eti(-l)'+*dct(Af,_iAi,fcv)a., fc > 1. 
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(31) 



Finally, we obtain 



(32) 



as implied by ©, ([TTl), and |(T2]). The results of Step la are the relations ([221), (Ell), and ([32]), which are 
valid for {K, k) G ICi. 

Step lb. We next show that and ^ hold for {K,k) G IC2 as well. Throughout Step lb we 

assume that {K, fc) e /C2, and, unless specified otherwise, all equations and statements involving k are valid 
for k = K + 1 , K + 2, . . . , AI. We know from Section HTTI that [RJk+i^k+i = 0. According to the definition 
of M, [R\k+i.k+i = implies = 0, = 0, and — 0. It is therefore to be shown that the RHS 
of l(27j) evaluates to zero, and that the RHS expressions of (jST]) and (|32l) evaluate to all-zero vectors. We 
start by noting that since k > K, Aijt is rank-deficient. Since rank(A^j,Ai^fc) — rank(Aijt) < k, we obtain 
that det(Af^^Ai^fc) on the RHS of l(27|) evaluates to zero. Next, for k > max{K, 1), the expression 



^(-l)'=+Met(Af,_iAi^,v)a. 



(33) 



on the RHS of l|3ip is a vector whose pth component can be written, by inverse Laplace expansion [9], as 



^(-l)^-+Met(Af,_iAi,fcv)[A]p,, = det 



Affe_iai Affe_ia2 
[A]p,i [A]p,2 



(34) 



for all p = 1, 2, . . . , P. Now, again for k > m.ax{K, 1), since Ai^k is rank-deficient, can be written as a 
linear combination 



fc-1 



afc 



(for some coefficients f]^'' \ k' — 1,2, . . . , k — 1) which implies that, for all p = 1, 2, . . . , P, the argument of 
the determinant on the RHS of ll34l) has 



Affc_iafc 



k-1 



[A]p,fc' 



as its last column. Since this column is a linear combination of the first fc — 1 columns, the determinant 
on the RHS of (|34|) is equal to zero for all p = 1, 2, . . . , P, and hence the expression in l|33p is equal to an 
all-zero vector for k > max(i^, 1). Moreover, ii K = Q and fc = 1, we have ai = on the RHS of ([3T|) . 
Hence, the RHS of (pT]) evaluates to an all-zero vector for all {K, fc) G /C2. Thus, l(3T|) simplifies to cjfe = 0, 
which in turn implies that the RHS of ((32l) evaluates to an all-zero vector as well. We have therefore shown 
that (1271), dHU), and JSll) hold for [K, k) e K.2. Finally, since /Ci U /C2 = K., the results of Steps la and lb 
imply that (HZD, and ^ are vaHd for (if, fc) e /C. 
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Step 2. We note that the derivations presented in Steps la and lb for a given sq G U do not depend on sq 
and can hence be carried over to all sq e U. Thus, we can rewrite (1271) . (ISTl) . and l(32]) . respectively, as 

Afe(s) =det(Affc(s)Ai,fc(s)) (35) 

{a/jfs), fc = 1 

(36) 
Eti(-l)'+'det(Af,_i(s)Ai,,.v(s))a,(s), fc > 1 

rl{s)^qfis)A{s) (37) 
for fc = 1,2, . . . ,M and s e Zi. 

S'iep 3. For fc = 1, 2, . . . , M , we note that A(s) ~ (Vi, ^2), along with V ^Vi+V2, implies A^j,(s)Ai,fc(s) - 
(y, V^). Now, the determinant on the RHS of (|35| can be expressed through Laplace expansion as a sum of 
products of fc entries of Af j.(s)Ai,fc(s) - (F, F). Therefore, we get Afe(s) - {kV, kV) for fc = 1, 2, . . . , M. 
Analogously, for fc = 2,3,...,M we obtain det(Af^;._;^(s)Ai^fc\,(s)) - ((fc - l)y, (fc - 1)^). The lat- 
ter result, combined with A(s) - (^^,^2) in (06]) yields qfc(s) ((fc - 1)1/ + Vi, (fc - 1)^ + V2), which 
holds for fc = 1 as well as a trivial consequence of l|36p and A(s) ^ (Vi,V2). Finally, from qfe(s) ~ 
((fc - l)y + V"i,(fc- l)F + y2) and 1371), using A(s) ~ (^1,^2) and F = V^i + V"2, we obtain f^(s) - (fc^, fc^) 
for fc = 1,2,...,M. □ 

5. Application to MIMO-OFDM 

We are now ready to show how the results derived in the previous section lead to algorithms that 
exploit the polynomial nature of the MIMO channel transfer function H(s) ^ (0, L) to perform efficient 
interpolation-based computation of QR factors of H(s„), for all n € given knowledge of H(s„) for n £ E. 
We note that the algorithms described in the following apply to QR decomposition of generic polynomial 
matrices that are oversampled on the unit circle. 

Within the algorithms to be presented, interpolation involves base points and target points on lA that 
correspond to OFDM tones indexed by integers taken from the set {0,1,..., TV — 1}. For a given set 
C {0, 1, . . . , iV — 1} of OFDM tones, we define S{X) = {s„ : n £ X} to denote the set of corresponding 
points on U. With this definition in place, we start by summarizing the brute-force approach described in 
Section [ 



Algorithm I: Brute-force per-tone QR decomposition 

1. Interpolate H(s) from S{£) to 5(X>). 

2. For each n gV, perform QR decomposition on H(s„) to obtain Q(s„) and R(s„). 



It is obvious that for large D, performing QR decomposition on a per-tone basis will result in high 

computational complexity. However, in the practically relevant case L <C f the OFDM system effectively 
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highly oversamples the MIMO channel's transfer function, so that H(s„) changes slowly across n. This 
observation, combined with the results in Section HI constitutes the basis for a new class of algorithms 
that perform QR decomposition at a small number of tones and obtain the remaining QR factors through 
interpolation. More specifically, the basic idea of interpolation-based QR decomposition is as follows. By 
applying Theorem [9] to the Mr x Mt LP matrix H(s) ^ (0, L), we obtain (\k{s) ^ {{k — l)L,kL) and 
r1{s) ^ {kL, kL) for A: = 1, 2, ... , Mt- In order to simplify the exposition, in the remainder of the paper we 
consider qfc(s) as satisfying qfc(s) ^ {kL, kL). The resulting statements 

cik{s),Yl{s)^{kL,kL), k^ 1,2,..., Mt (38) 

imply that both qfc(s) and f|"(s) can be interpolated from at least 2kL + 1 base points, and that, as a con- 
sequence of Vi = V2 = kL, the corresponding interpolation matrices are real-valued. For k — 1,2, ... , Mt, 
the interpolation-based algorithms to be presented compute qfc(s„) and f|'(s„), through QR decomposition 
followed by application of the mapping A^, at a subset of OFDM tones of cardinality at least 2kL + 1, 
then interpolate qfc(s) and fj(s) to obtain qA:(s„) and f|"(s„) at the remaining tones, and finally apply the 
inverse mapping M^^ at these tones. In the following, the sets Xfe C {0, 1, . . . , iV — 1}, with Ik-i C Xj. and 
Bk = \Tk\ > 2fc_L -I- 1 (fc 1, 2, . . . , Mt), contain the indices corresponding to the OFDM tones chosen as 
base points. For completeness, we define Iq = 0. Specific choices of the sets Tk will be discussed in detail in 
Section [H 

We start with a conceptually simple algorithm for interpolation-based QR decomposition, derived from 
the observation that the Mt statements in l(38|) can be unified into the single statement Q(s),R(s) ~ 
(MtL, MtL). This implies that we can interpolate Q(s) and R(s) from a single set of base points of 
cardinality Bmt- The corresponding algorithm can be formulated as follows: 



Algorithm II: Single interpolation step 




1. Interpolate H(s) from S{E) to S{2mt)- 




2. For each n G Jmt^ perform QR decomposition on H(s„) to obtain 


Q(s„) and R(s„). 


3. For each n G Imt, apply M : (Q(s„), R(s„)) (Q(s„), R(s„))- 




4. Interpolate Q(s) and R(s) from S{2mt) to S{'D\Imt)- 




5. For each n g V\Imt, apply : (Q(s„), R(s„)) 1-^ (Q(s„),R(s„ 


))• 



This formulation of Algorithm II assumes that H(s„) has full rank for all n e V\Imt^ which allows to 

perform all inverse mappings in Step[5]using (fT3)) - ifT5]) only. If, however, for a given n £ 'D\1mt^ H(s„) 

is rank-deficient with ordered column rank K < Mt, we have Qif+i^M,, — and R^+^'*^^(s„) = 0. 

Hence, according to the results in Section 221 Qx+i. Mt (sn) and R^+^'*^^(s„) must be computed through 

QR decomposition of H/f+i, MtC'^u) — Qi,/f(sn)R/f+i Afy(sn) for if > or of H(s„) for K = Q. This, in 

turn, requires 'H.K+i,MT{^n) to be obtained by interpolating Hi<-+i,A./y (s) from S{£) to the single target 
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point Sn in an additional step. For simplicity of exposition, in the remainder of the paper we will assume 
that H(s„) is full-rank for all n eV. 

Departing from Algorithm II, which interpolates qfc(s) and rj{s) from Bmt base points, we next present 
a more sophisticated algorithm that involves interpolation of qfc(s) and f J(s) from Bk < Bmt base points 
{k — 1,2, ... , Mt), in agreement with (|38| . The resulting Algorithm III consists of Mt iterations. In the 
first iteration, the tones n e Xi are considered. At each of these tones, QR decomposition is performed 
on H(s„), resulting in Q(s„) and R(s„), which are then mapped to (Q(s„), R(s„)) by applying M. Next, 
qi(s) and rf{s) are interpolated from the tones n g Ii to the remaining tones n e In the kth iteration 

(/c = 2, 3, . . .,Mt), the tones n e lk\Tk-i are considered. At each of these tones, Qi,fc-i(s„) and R^''''~-^(s„) 
are obtaineqj by applying to (Qi,fc_i(s„),R^''^^^(s„)), already known from the previous iterations, 

whereas the submatrices Q^.a/tCsm) and R^'j)^^(s„) are obtained by performing QR decomposition on the 
matrix Hk^MTi^n) — Qi,fc-i(sTt)RfcMv ('^")' accordance with Proposition [6l and R'^'^^^(s„) is given, for 
/c > 1, by [O R^'j)^^^(s„) ] . Next, the submatrices Qfc,j\/r (■Sn) and R'^'*^^(s„) are computed by applying M 
to (Qfc,AfT(*n);I^'^'*'^^ (■*"))■ Since the samples qfc(s„) and (s„) are now known at all tones n e Ik, qfe(s) 
and f|"(s) can be interpolated from the tones n e Ifc to the remaining tones n G I?\Ife, thereby completing 
the fcth iteration. After Mt iterations, we know Q(s„) and R(s„) at all tones n G V, as well as Q(s„) 
and R(s„) at the tones n e Imt- The last step consists of applying to (Q(s„), R(s„)) to obtain Q(sn) 

and R(s„) at the remaining tones n e 'D\Ik- The algorithm is formulated as follows: 

Algorithm III: Multiple interpolation steps 

1. Set k<^l. 

2. Interpolate tikMri^) from S(£) to S{2k\^k-i)- 

3. If fc — 1, go to Step m Otherwise, for each n e lk\Tk-i, apply M^^ : 
(Qi,fc-i(s„),Ri''=-i(s„)) ^ (Qi,fc-i(s„),Ri''^'-Hs«))- 

4. For each n e Ik\Ik-i, overwrite Hfe,Af^(s„) by Hfc_j\/^(s„) - Qi^fe_i(s„)R^'^7^^(s„). 

5. For each n e 2k\^k~i, perform QR decomposition on Hk^MTi^n) to obtain Q^.a/tIsm) and 
KMli^n), and, if fc > 1, construct R'='^'^-(s„) = [o R^'f/Jsn)]- 

6. For each n e Ife\2^fe-i, apply 7W : (Qfc.A/^ (s„), R^^^^^ («„)) ^ (Qfe,A/T R'''*^^ (sn))- 

7. Interpolate (lk{s) and (s) from to SiV\Ik). 

8. If fc = Mt, proceed to the next step. Otherwise, set fc <— fc + 1 and go back to StepHJ 

9. For each n € V\Imt, apply : (Q(s„), R(s„)) ^ (Q(s„), R(s„)). 

In comparison with Algorithm II, Algorithm III performs QR decompositions on increasingly smaller 
matrices. The corresponding computational complexity savings are, however, traded against an increase in 



^The mapping M and its inverse Ai ^ are defined on submatrices of Q(s„) and R(s„) according to H10|l - H15p . 
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interpolation effort and the computational overhead associated with Step [4l which will be referred to as 
the reduction step in what follows. Moreover, the complexity of applying M and differs for the two 

algorithms. A detailed complexity analysis provided in the next section will show that, depending on the 
system parameters, Algorithm III can exhibit smaller complexity than Algorithm II. 

We conclude this section with some remarks on ordered SC MIMO-OFDM detectors Q, which essen- 
tially permute the columns of H(s„) to perform SC detection of the transmitted data symbols according 
to a given sorting criterion (such as, e.g., V-BLAST sorting [2l||) to obtain better detection performance 



than in the unsorted case. The permutation of the columns of H(s„) can be represented by means of 
right-multipHcation of H(s„) by an Mt x Mt permutation matrix P(s„). The matrices subjected to 
QR decomposition are then given by H(s„)P(s„),n e V. If P(s„) is constant across all OFDM tones, 
i.e., P(s„) = Poi^^ G 2?, we have H(s)Po ~ (0,i) and Algorithms I-III can be applied to H(s„)Po. A 
MIMO-OFDM ordered SC detector using Algorithm II to compute QR factors of H(s)Po, along with a 



strategy for choosing Po, was presented in 



22|. If P(sn) varies across n, the matrices H(s„)P(s„),n e V, 



in general, can no longer be seen as samples of a polynomial matrix of maximum degree L D, so that the 
interpolation-based QR decomposition algorithms presented above can not be applied. 



6. Complexity Analysis 

We are next interested in assessing under which circumstances the interpolation-based Algorithms II 
and III offer computational complexity savings over the brute-force approach in Algorithm I. To this end, we 
propose a simple computational complexity metric, representative of VLSI circuit complexity as quantified 
by the product of chip area and processing delay flQ]. We note that other important aspects of VLSI 
design, including, e.g., wordwidth requirements, memory access strategies, and datapath architecture, are 
not accounted for in our analysis. Nevertheless, the proposed metric is indicative of the complexity of 
Algorithms I-III and allows to quantify the impact of the system parameters on the potential savings of 
interpolation-based QR decomposition over brute-force per-tone QR decomposition. 

In the remainder of the paper, unless explicitly specified otherwise, the term complexity refers to com- 
putational complexity according to the metric defined in Section 16.11 below. We derive the complexity of 
individual computational tasks (i.e., interpolation, QR decomposition, mapping ^A, inverse mapping 
and reduction step) in Section [621 Then, we proceed to computing the total complexity of Algorithms I-III 
in Section 16.31 Finally, in Section 16.41 we compare the complexity results obtained in Section 16.31 and we 
derive conditions on the system parameters under which Algorithms II and III exhibit lower complexity 
than Algorithm I. 
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6.1. Complexity Metric 

In the VLSI implementation of a given algorithm, a wide range of trade-offs between silicon area A 
and processing delay r can, in general, be realized [10||. Parallel processing reduces r at the expense of 
a larger A, whereas resource sharing reduces A at the expense of a larger r. However, the corresponding 
circuit transformations typically do not affect the area-delay product At significantly. For this reason, the 



area-delay product is considered a relevant indicator of algorithm complexity [10|. In the definition of the 
specific complexity metric that will be used subsequently, we only take into account the arithmetic operations 
with a significant impact on At. More specifically, we divide the operations underlying the algorithms under 
consideration into three classes, namely i) multiplications, ii) divisions and square roots, and iii) additions 
and subtractions. Class iii) operations will not be counted as they typically have a significantly lower VLSI 
circuit complexity than Class i) and Class ii) operations. 

In all algorithms presented in this paper, the number of Class i) operations is significantly larger than 
the number of Class ii) operations |f| By assuming a VLSI architecture where the Class ii) operations are 
performed by low-area high-delay arithmetical units operating in parallel to the multipliers performing the 
Class i) operations, it follows that the Class i) operations dominate the overall complexity and the Class ii) 
operations can be neglected. 

Within Class i), we distinguish between full multiplications (i.e., multiplications of two variable operands) 
and constant multiplications (i.e., multiplications of a variable . We define 

the cost of a full multiplication as the unit of computational complexity. We do not distinguish between real- 
valued full multiplications and complex-valued full multiplications, as we assume that both are performed 
by multipliers designed to process two variable complex- valued operands. The fact, discussed in detail in 
Section 18.11 that a constant multiplication can be implemented in VLSI at significantly smaller cost than a 



full multiplication, will be accounted for through a weighting factor smaller than one. 

6.2. Per- Tone Complexity of Individual Computational Tasks 

In order to simplify the notation, in the remainder of this section we drop the dependence of all quantities 
on s„. We furthermore introduce the auxiliary variable 



^We assume that division of an M-dimensional vector a by a scalar a, such as the divisions in |[2j, II13II . or Ill4p . is 

implemented by first computing the single division fj = l/a and then multiplying the M entries of a by /3, at the cost of one 

Class ii) operation and M Class i) operations, respectively. 

■^In the context of the interpolation-based algorithms considered in this paper, all operands that depend on H(s) are 

assumed variable. The coefficients of interpolation filters, e.g., are treated as constant operands. For a detailed discussion on 

the difference between full multiplications and constant multiplications, we refer to Section [sTTI 
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which specifies the maximum total number of nonzero entries in Qi,fc and R^"'"', and hence also in Qi_fc 
and R^'*^, in accordance with the fact that R and R are upper triangular. 

Interpolation. We quantify the complexity of interpolating an LP to one target point through an equivalent 
of cip full multiplications. The dependence of interpolation complexity on the underlying VLSI implementa- 
tion and on the number of base points is assumed to be incorporated into cjp . Specific strategies for efficient 
interpolation along with the corresponding values of cip are presented in Section [8l Since interpolation of 
an LP matrix is performed entrywise, the complexity of interpolating Hk^Mr (*) to one target point is given 
by 

cJ^H = Mr{Mt -k + l)cip, 1,2,..., Mt. 

Similarly, interpolation of Q(s) and R(s) to one target point has complexity 

and the complexity of interpolating qfe(s) and f^(s) to one target point is given by 

cjplqf = {Mr + Mt - fc + l)cip, fc = 1, 2, . . . , Mt. 

QR decomposition. In order to keep our discussion independent of the QR decomposition method, we denote 
the cost of performing QR decomposition on an MiiXk matrix by Cqj^'''' (fc = 1,2,..., Mt). Specific 
expressions for Cq^^'^ will only be required in the numerical complexity analysis in Section [9l 

Mapping M. We denote the overall cost of mapping (Q^^/j, , R'''*^^) to (Qfc^/r, R'''*^^) (fc = 1, 2, . . . , Mt) 
by Cj^^^ ■ In the case k ~ 1, appHcation of the mapping M. requires computation of [R]i4, [R]i.i, 
[R]f_i[R]2,2, [R.]?.i[R-]2,2: ■ • • '11^=1 [R-lL) at the cost of 2Mt - 1 full multiplications. This step yields both 
the scaling factors Afc/_i [RJ^/ fc/ , k' — 1,2, . . . , Mt, and the diagonal entries of R. From l|3ip we can deduce 
that the first column of Q is equal to the first column of H and is hence obtained at zero complexity. The 
remaining entries of Q and the entries of R above the main diagonal are obtained by scaling the corre- 
sponding entries of Q and R according to (fTTI) and l|12p , respectively, which requires Jmt ~ M^ — Mt full 
multiplications. Hence, we obtain 

c]^^'^ = Jmt - Mr + Mt - 1. 
Next, we consider the case k > \, which only occurs in Step [3] of Algorithm III, where Afe_i = [R]/j_i 
is already available from the previous iteration which involves interpolation of t'1._^{s). The appHca- 
tion of the mapping M first requires computation of Afe_i[R]fc^fc, Afc_i[R]^ j^,, Afe_i[R]^j,[R]fc+i_fc+i, . . . , 
Afc_i J|*£^[R]|j, at the cost of 2{Mt — fc + 1) full multiplications. Then, the entries of Clk.MT a-^d the 
entries of R*^'^^ above the main diagonal of R are scaled according to (fTTj) and lfT2)) . which requires 
Jmt ~ Jk-i — {Mt ~ k + 1) full multiplications. In summary, we obtain 

^m'^ = Jmt - Jk-i +MT-k + l, k^2,3,...,MT. 

23 



Table 1: Total complexity associated with the individual computational tasks 



Computational task Symbol 


Algorithm I 


Algorithm II 


Algorithm III 


Interpolation of H(s) cip_h,a 


J) i,JV/t 

^''IP.H 




-°lClP,H + ClP,H 


Interpolation of Q(s) and R(s) Cjp ^ 





{D - -Bj\/T)Cip_qR, 


fc=2 

Mr 

k = l 


QR decomposition cqr^a 


p. May-Mr 


Bm^Cq^ 


Mtp 


Mapping M cm, a 









Inverse mapping A4~^ Ca<-i,a 









Reduction Cred,A 








k = 2 



The index A is a placeholder for the algorithm number (I, II, or III). 



Inverse mapping . We denote the overall cost of mapping (Qi^/t, R-'^''^) to (Qi^/c, R^''^) {k — 1,2,..., Mr) 
by Cj^^^i. Since Aq = 1 and [R]i,i ~ [R-li,!, by first computing ([R]i^i)^/^ and then its inverse, we can 
obtain both [R]i,i and the scaling factor (Ao[R]i,i)^^ = l/[R]i,i at the cost of one square root opera- 
tion and one division. For k' = 2,3, . . . , k, the scaling factors {Ak' -i[Ii\k' ,k')~^ can be obtained according 
to ifTSj) by computing {[Ii\k'-i.k'-i[^]k' ,k')~^^^ , at the cost of fc — 1 full multiplications, k — 1 square root 
operations, and k — I divisions. The entries of Qi,fe and the remaining entries of R^'*^ on and above the 
main diagonal of R are obtained by scaling the corresponding entries of Qi^fc and R^-'"' according to lfT3)) 
and lfT4)) . respectively, at the cost of Jfc — 1 full multiplications. Since we neglect the impact of square root 
operations and divisions on complexity, we obtain 

c]^_, = Jk + k - 2, fc= 1,2,... ,Mt. 

Reduction step. Since matrix subtraction has negligible complexity, for a given k G {1, 2, . . . , Mt}, the 
complexity associated with the computation of H^^Mt ~ Qi.fc-iRfcA/T ' denoted by c^J^^, is given by the 
complexity associated with the multipHcation of the M^jx (fc— 1) matrix Qi,fc-i by the (fc — 1) x (My- fc + 1) 
matrix R^'^^^^. Hence, we obtain 

cl!^l = MR{k-l){MT-k+l). 

6.3. Total Complexity of Algorithms I III 

The contribution of a given computational task to the overall complexity of a given algorithm is obtained 

by multiplying the corresponding per-tone complexity, computed in the previous section, by the number of 
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relevant tones. For simplicity of exposition, in the ensuing analysis we restrict ourselves to the case where 
Bk = 2kL + 1 (A; = 1, 2, . . . , Mt) and Ii C X2 ^ . . . C c V, for which we obtain \lk\Tk-i\ = 2i and 
\V\lk\ = -D — 2kL — 1 (fc = 1, 2, . . . , Mt)- With the total complexity of the individual tasks summarized in 
Table [U the complexity associated with Algorithms I-III is trivially obtained as 

Ci = Cip,H,i + cqr,i (39) 

ClI = Clp_H,II + Clp,QR.,II + cqr,ii + cm,u + c^-i,ii (40) 
Cm = Cip^HJII + Cjp jjj + Cqrjii + CjVlJII + Cj^-i jii + Cred.III- (41) 

6.4- Complexity Comparisons 

In the following, we identify conditions on the system parameters and on the interpolation cost cip that 
guarantee that Algorithms II and III exhibit smaller complexity than Algorithm I. We start by comparing 
Algorithms I and II and note that 

Ci - Cii = (£> - Bmt) ( Cqr - c_^-i ^ cip 1 - BmtCj^ ■ (42) 

Hence, if cip satisfies 

9 ( MrxMt _ 1,Mt\ 

CIP < cip,„,.,„ = m^^Mt + 1) ^^^^ 

then there exists a Dmin such that C\\ < C\ for D > Dmin, i-e.. Algorithm II exhibits a lower complexity 
than Algorithm I for a sufficiently high number of data-carrying tones D. Moreover, for cjp < cipmaxji) 
increasing Bmt reduces Cj — Cn. If the inequality (^31) is met, implies, since Bmt — ^AItL + I, that for 
increasing L and with all other parameters fixed. Algorithm II exhibits smaller savings. For larger Cq^r 
again with all other parameters fixed. Algorithm II exhibits larger savings. 

In order to compare Algorithms II and III, we start from l(iO)) and and rewrite Cu — Cm as 

Cii - Cm = Acqr + Acm,m-^ + ^(^ip,HQfi ^ Cred.iii (44) 

where we have introduced 

Acqr = Cqrji — Cqrjii 
Ac^,7Vl-i — CmJI + Ca4-i,II — CA4,III ~ JII 

^Cip HQR, = cip,H,ii + Cjp QR, II - cip,H,iii - Cip QR,iii- 
From the results in Table [1] we get 

Mt 

Acqr = 2LJ2 (c-— - c--(^'^-^-+^)) (45) 

fe=2 
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which is positive since, obviously, Cg^jf^*^^ > Cq^r^*"*^^ ''^^'^ (k — 2, 3, . . . , Mt). Furthermore, again em- 
ploying the results in Table [H straightforward calculations yield 

Mt 

^Cip,HQR = -2L k{k - l)cip 

k=2 

= -^LMt(m2 -l)cip (46) 

and 

^Cm,m-' = {Bi - Bmt){Mr - l) 

= -2L{Mr~1){Mt~1). (47) 

We observe that (|44|) - (|47|) . along with the expression for Cred.iii in Table [H imply that C\\ — Cm does not 
depend on D and is proportional to L. Moreover, it follows from and (H6| that Cm < Cn is equivalent 
to cip < cip^maxjii with 

'^^■'"^''■"^ |lMt(m|-i) ■ ^^^^ 

We note that the RHS of l|48p depends solely on Mt and Mr, since Acqr, Act^i.ai-i, and Cred.iii are 
proportional to L. Hence, if Acqr + Ac^^ ^vf-i — Cred.iii > and for cjp sufHciently small. Algorithm III has 
lower complexity than Algorithm II. 



7. The MMSE Case 

In this section, we modify the QR decomposition algorithms described in Section [5] to obtain corre- 
sponding algorithms that compute the MMSE-QR decomposition, as defined in Section 13.21 of the channel 
matrices H(s„),n e V. In Section ITTTl we discuss the general concept of regularized QR decomposition, 
of which MMSE-QR decomposition is a special case. In Section 17.21 we use the results of Section 17.11 to 
formulate and analyze MMSE-QR decomposition algorithms for MIMO-OFDM. 

7.1. Regularized QR Decomposition 

In the following, we consider, as done in Section [221 a generic matrix A e C^^^'\ with P > M. 

Definition 10. The regularized QR decomposition of A with the real- valued regularization parameter a > 0, 
is the unique factorization A = QR, where the regularized QR factors Q £ £P^m ^^j^j ^ £Mxm ^^^^ 
obtained as follows: A = QR is the unique QR decomposition of the full-rank {P + M) x M augmented 
matrix A = [ A-^ ciIm I^^j ^.nd Q = Q^'^. 

In the following, we consider GS-based and UT-based algorithms for computing the regularized QR de- 
composition of A through the QR decomposition of the augmented matrix A. We will see that both classes 
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of algorithms exhibit higher complexity than the corresponding algorithms for QR decomposition of A 
described in Section [221 

GS-based QR decomposition of A produces Q, R, and, as a by-product, the M x M matrix Q^+i'^+*-'^. 
Since GS-based QR decomposition according to (IT])-(l3l) operates on entire columns of the matrix to be 
decomposed, the computation of qP+^^p+^'I can not be avoided. Thus, GS-based regularized QR decom- 
position of A has the same complexity as GS-based QR decomposition of A, which in turn has a higher 
complexity than GS-based QR decomposition of A. 

Representing the UT-based QR decomposition of A in the standard form ([4]) yields 







u ■ ■ 



02©1 



A 

alM 



Ip 
1m 



R 

(Q^)« 



(49) 



A I 



P+M . 



with the (P -I- M) x (P + M) unitary matrices 0„, u = 1, 2, . . . , [/, and where is a (P -I- M) x P matrix 
satisfying (Q-^)^Q-^ = Ip and Q^Q-^ = 0. By rewriting the RHS of ^ as 



R 

(Q^)« 



R 

m^Y-n" 



{{Ql^)P+^^P+^i)H 



(50) 



we observe that UT-based regularized QR decomposition of A according to l(49|) . besides computing R 
and Q^, yields the matrices (Q-*-)^ and (Q^+^'-^+*^)^ as by-products. As observed previously in jsl, the 
corresponding complexity overhead can not be eliminated completely, but it can be reduced by removing 
the last M columns on both sides of (|49)) . Thus, using ((50)) . we obtain the efficient UT-based regularized 
QR decomposition described by the standard form 



A 

alM 



Ip 




R 

m^y^n" 



(51) 



±\1.P\H 



which yields only ((Q ) ) as a by-product [3i|. We note that since the P x P matrix ((Q 
larger than the {P — M) x P matrix (Q-'-)-^ in Q, obtained as a by-product of UT-based QR decomposition 
of A, efHcient UT-based regularized QR decomposition of A exhibits higher complexity than UT-based 
QR decomposition of A. 

Finally, we note that since Q — Q}'^, applying the mapping M to the regularized QR factors Q and R 
of A according to (fT0l) - lfT2)) is equivalent to applying M. to the QR factors Q and R of A to obtain Q 
and R followed by extracting Q = Q^'^. With this insight, it is straightforward to verify that Theorem [9l 
formulated for QR decomposition of an LP matrix A(s), is valid for regularized QR decomposition of A(s) 
as well. 
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1.2. Application to MIMO-OFDM MMSE-Based Detectors 

With the definition of regularized QR decomposition in the previous section, we recognize that MMSE- 
QR decomposition of H(s„), defined in Section 13.21 is a special case of regularized QR decomposition 
of H(s„) obtained by setting the regularization parameter a to \/Mt<^w The modification of Algorithms I 
and II to the MMSE case is straightforward and simply amounts to replacing, in Step [2] of both algorithms, 
QR decomposition by MMSE-QR decomposition. The resulting algorithms are referred to as Algorithm I- 
MMSE and Algorithm II-MMSE, respectively. 

In the following, we compare the complexity of Algorithm I-MMSE and Algorithm II-MMSE. By de- 
noting the complexity associated with computing the MMSE-QR decomposition of an Mr x Mt matrix 
by c^Ss&QR' the overall complexity of Algorithms I-MMSE and II-MMSE is given by 



and 



I D / MrXMt MrXMt\ fKO\ 

L-II-MMSE - L'll + ^Mt ICmmSE-QR ~ ^qj^ j 



respectively. Since c^mse-qr > '^qr explained in Section lTTTl (l52l) and l(53)) imply that Ci_mmse > C'l 

and Cii_mmse > Cu, respectively. Thus, from l(39l) . (l40l) . ((52l) . and ((53l) . we get 



/ I , \ I D f I.-A/t I MrXMt 

CiI-MMSE ^^^'^^ + '^IP.QRJI + Cm-\u) + ^Mt [c^P,H + CmmSE-QR 



(cx,ii + Cjp QRji + ca<-i,ii) + Bmt (cjp^h 



filR X Mt 



1,Mt I MrxMt 
^ 4p,H ^MMSE-QR 



Cl-MMSE 



'^IP.H + '^QR 



where the inequality follows from the simple property 



(54) 



a>/3>0,7>0 =^ ^<75- 
From (IM)) we can therefore conclude that 

ClI-MMSE ^ Cii 
Cl-MMSE Cl 

which implies, assuming Cn < Cj, that the relative savings of Algorithm II-MMSE over Algorithm I-MMSE 
are larger than the relative savings of Algorithm II over Algorithm I. 

Finally, we briefiy discuss the extension of Algorithm III to the MMSE case. As a starting point, we con- 
sider the straightforward approach of applying Algorithm III to the MMSE-augmented channel matrix H(s„) 
in ^ to produce Q(s„) and R(s„) for all n g P. In the following, we denote by Q(s„) and R(s„) the matri- 
ces resulting from the application of the mapping M to (Q(s„), R(s„)). We observe that the straightforward 
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approach under consideration is inefficient, since we are only interested in obtaining Q(s„) = Q^'^^"(sn) 
and R(s„) for all n e P. Consequently, we would like to avoid computing the last Mt rows of Q(s„) at as 
many tones as possible. Now, the reduction step (i.e., StepS]) in the kih iteration of Algorithm III requires 
knowledge of Qi,a:_i(s„) at the tones n e 2k\^k-i (fc = 2, 3, . . . , Mt). Hence, at the tones n e Jk\^k-i we 
must compute all + Mt rows of Qi,fc_i(s„) anyway. In contrast, at the tones n G 'D\2mt the last Mt 
rows of Q(s„) are not required. Therefore, at the tones n £ 'D\1mt we can restrict interpolation and inverse 
mapping to Q(s„) — Q^^*^"(sn) and R(s„). 

In the following, we partition qfc(s„), the kth column of Q(s„), as 



qfe(Sn) 
^k{Sn) 



1,2,. ..,Mt 



with the Mr x 1 vector qfc(s„) and the Mt x 1 vector qfc(s„). With this notation, we can formulate the 
resulting algorithm as follows: 



Algorithm III-MMSE 

1. Set k<^l. 

2. Interpolate 'H.}~,Mt{^) from S{E) to S{2k\^k-i)- 

3. For each n e lk\Tk-i, construct Hfe^7\/y(s„) according to ([9]). 

4. If fc = 1, go to Step [6l Otherwise, for each n £ lk\Tk-i, apply M^^ : 

5. For each n G Ik\Ik-i, overwrite Hfc_Af^(s„) by Hfc,Mr(sn) - Qi.fe-i(s„)R^;^7T (s„). 

6. For each n G 2k\^k-i, perform QR decomposition on Hfc^Mr(sn) to obtain Q/c, MtIsm) 



and Rll^K^n), and, if A: > 1, construct R'=^^^^(s„) = [O R"^7(s„) ; 



7. For each n G lA^^fe-i, apply M : (Qfc.M^ (s„), R^^^^^ (s„)) ^-^ (Q^^Mt R'^'*'^ (sn)) 

8. Interpolate qfc(s) and (s) from 5(1^) to SiV\Ik)- 

9. If A; = Mt, proceed to Step[TTJ Otherwise, interpolate iik{s) from Silk) to SilMrX^k)- 

10. Set A: ^ fc + 1 and go back to Stepd 

11. For each n G V\Imt, apply : (Q(s„), R-(s„)) >-> (Q(s„), R(s„)). 



"Since qM^^C^n) is not needed, its computation in the A/yth iteration can be skipped. 



A detailed complexity analysis of Algorithm III-MMSE goes beyond the scope of this paper. We men- 
tion, however, the following important aspect of the comparison of Algorithm III-MMSE with Algorithms 
I-MMSE and II-MMSE. Step [2] of Algorithms I-MMSE and II-MMSE requires MMSE-QR decomposition, 
which is a special case of regularized QR decomposition, whereas Step [6] of Algorithm III-MMSE requires 
QR decomposition of an augmented matrix. As shown in Section [7?n the algorithms for regularized QR de- 
composition and for QR decomposition of an augmented matrix have the same complexity under a GS-based 
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approach, but not under a UT-based approach. In the latter case, Algorithms I-MMSE and II-MMSE can 
perform efficient UT-based regularized QR decomposition according to the standard form l(5T|) . whereas 
Algorithm III-MMSE must perform UT-based QR decomposition of an augmented matrix according to the 
standard form l(49|) . which results in higher complexity. This aspect does not occur in the comparison of 
Algorithm III with Algorithms I and II and will be further examined numerically in Section [931 

8. Efficient Interpolation 

Throughout this section, we consider interpolation of a generic LP a{s) ^ (Vi, V2) of maximum degree 
V = Vi +V2 from B to T, where \B\ — B and |T| — T. We note that in the context of interpolation in 
MIMO-OFDM systems, relevant for the algorithms presented in this paper, all base points and all target 
points correspond to OFDM tones. Therefore, in the following we assume that B and T satisfy the condition 

BUT C{so,si,...,SN-i}. (55) 

The complexity analysis in Section[6] showed that interpolation-based QR decomposition algorithms yield 
savings over the brute-force approach only if cip is sufficiently small. Straightforward interpolation of a(s), 
which corresponds to direct evaluation of lHJ) , is performed by carrying out the multipHcation of the T x B 
interpolation matrix TB^^ by the Bxl vector ag. The corresponding complexity is given by TB, which results 
in cip = B full multiplications per target point. In the context of interpolation-based QR decomposition, 
this complexity may be too high to get savings over the brute-force approach in Algorithms I or I-MMSE, 
since exact interpolation of qfc(s) ^ {kL, kL) and f J(s) '-^ {kL, kL) requires B > 2kL + l (k — 1,2, ... , Mt), 
with the worst case being B > 2MtL -|- 1. In this section, we present interpolation methods characterized 
by significantly smaller values of cip . As demonstrated by the numerical results in Section [9l this can then 
lead to significant savings of the interpolation-based approaches for QR decomposition over the brute-force 
approach. 

8.1. Interpolation with Dedicated Multipliers 

As already noted, the interpolation matrix TB^ is a function of B, T, Vi and V2, but not of the realization 
of the LP a{s) to be interpolated. Hence, as long as B, T, Vi and V2 do not change, multiple LPs can be 
interpolated using the same interpolation matrix TB''', which can be computed off-Hne. This observation 
leads to the first strategy for efficient interpolation, which consists of carrying out the matrix-vector product 
(TB^)ag in ([8]) through TB constant multiplications, where the entries of TB^ are constant and the entries 
of ag are variable. 

In the context of VLSI implementation, full multiplications and constant multiplications differ signifi- 
cantly. Whereas a full multiplication must be performed by a full multiplier which processes two variable 
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operands, in a constant multiplication, the fact that one of the operands, and more specifically its binary 
representation, is known a priori, can be exploited to perform binary logic simplifications that result in 
a drastically simpler circuit [lo|. The resulting multiplier, called a dedicated multiplier in the following, 
consumes only a fraction of the silicon area (down to 1/9, as reported in for complex- valued dedicated 
multipliers) required by a full multiplier, and exhibits the same processing delay. Furthermore, we mention 
that it is possible to obtain further area savings, again without affecting the processing delay, by merging K 
dedicated multipHers into a single block multiplier that jointly performs the K multiplications, according 
to a technique known as partial product sharing [ll | , which essentially exploits common bit patterns in the 
binary representations of the K coefficients to obtain circuit simplifications. For simplicity of exposition, in 
the sequel we do not consider partial product sharing. 

In the remainder of the paper, xc and XR denote the complexity associated with a constant multipli- 
cation of a complex- valued variable operand by a complex- valued and by a real- valued constant coefficient, 
respectively. Since TB^ is real-valued for Vi — V2 and complex-valued otherwise, interpolation through 
constant multipHcations with dedicated multipHers has a complexity per target point of 



Cip 



XrB, Vi = V2 
XcB, Vi^V2. 



By leaving a cautionary implementation margin from the best-effort value of 1/9 reported in we assume 
that Xc = 1/4 in the remainder of the paper. Since the multiplication of two complex- valued numbers 
requires (assuming straightforward implementation) four real- valued multipHcations, whereas multiplying a 
real- valued number by a complex- valued number requires only two real- valued multiplications, we henceforth 
assume that xr — Xc/2, which leads to xm — 1/8. 

8.2. Equidistant Base Points 

In the following, we say that the points in a set {wo,ui, . . . ,uk-i\ C U are equidistant on U \{ Uk = 
uqc^'^'^^/^ for A; = 1, 2, . . . , if — 1. So far, we discussed interpolation of a{s) ^ (Vi, V2) for generic sets B 
and T. In the remainder of Section [8] we will, however, focus on the following special case. Given integers 
B,R > 1, we consider the set of B base points B = {bk = eJ"^^^/^ : k = 0, 1, . . . , S — 1} and the set of 
T = {R- l)B target points T = {ti^B,-i)k+r-i = fefce^27rr/(/?s) ; fc 0, 1, . . . , B - 1, r = 1, 2, . . . , i? - 1}. 
We note that both the B points in B and the RB points in 6 U T = {eJ27rV(flS) ; / ^ 0, 1, . . . , i?S - 1} are 
equidistant on U. Hence, interpolation of a{s) from ,B to T essentially amounts to an i?-fold increase in the 
sampling rate of a{s) on U, and will therefore be termed upsampling of a{s) from B equidistant base points 
by a factor of R in the remainder of the paper. The corresponding base point matrix B and target point 
matrix T are constructed according to ^ and |(7]), respectively. We note that for B > V + 1, B satisfies 
B^B = BIb and hence Bt = {1/B)B". 
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We recall that the number of OFDM tones N is typically a power of two. Therefore, in order to have RB 
equidistant points on U while satisfying the condition l(55)) , in the following we constrain both B and R to 
be powers of two. Finally, in order to satisfy the condition B >V+1 mandated by the requirement of exact 
interpolation, we set B = 2ri°g(^+i)l . 

8.3. Interpolation by Fast Fourier Transform 

In the context of upsampling from B equidistant base points by a factor of R, it is straightforward to 
verify that the B x {V + I) matrix B is given by 

^^[(Wb)b-v,+i,b {Wb)i.v,+i] (56) 

and that the (i? — 1)B x (V" + 1) matrix T is obtained by removing the rows with indices in TZ = {1, R + 
1, . . . , (B - 1)7? + 1} from the RB x {V + 1) matrix 

T - [(WRB)ijB_Vi+i,i?,B (^rb)i,V2+i]- (57) 

As done in Section [273t we consider the vectors a — [a_Vi a-Vi+i ■ • • 0^2]"^) = Ba, and ar = Ta. By 
defining the B-dimensional vector a*^^) = [uq ai ■ ■ ■ • • • a^Vi o-Vi+i • • • 0,-1]^, which contains 
B ~ {V + 1) zeros between the entries and a-Vi, and by taking ((56l) into account, we can write ag = 
Ba = W^a'^^^ from which follows that a^^^^ = W^^ag. Next, we insert {R — l)B zeros into a^^^^ after 
the entry to obtain the i?i?-dimensional vector a*^^^-' = [ao ai • • • ay^ • • • a^Vi o-Vi+i • • • o-i]"^- 
Further, we define sl/suT = [a(e^") a(e^^''/^^) • • • a(eJ2^(^^-i)/«S)]T = Ta to be the vector containing the 
samples of a{s) at the points in ;BU T. We note that using (1571) we can write 

Ta = WflBa(^^). (58) 

Next, we observe that by removing the rows with indices in TZ from both sides of the equality agyT = Ta we 
obtain the equality ar — Ta. The latter observation, combined with ((58l) . implies that ar can be obtained 
by removing the rows with indices in TZ from the vector Wi^^a^^^l Finally, we note that since B and RB 
are powers of two, left-multiplication by and Wrb can be computed through a i?-point radix-2 inverse 

FFT (IFFT) and an i?B-point radix-2 FFT, respectively Q|. We can therefore conclude that FFT-based 
interpolation of a{s) from B to T can be carried out as follows: 

1. Compute the S-point radix-2 IFFT a'^) = W^^ag. 

2. Construct a'^^^ from a*^^) by inserting {R — 1)B zeros after the entry in a^^'. 

3. Compute the i?B-point radix-2 FFT agyr = Wnsa^^^l 

4. Extract ar from agyr by removing the entries of agyr with indices in TZ. 
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Now, we note that if generic radix-2 IFFT and FFT algorithms are used in Steps 1 and 3, respectively, 
the approach described above does not exploit the structure of the problem at hand and is inefficient in 
the following three aspects. First, neither the IFFT in Step 1 nor the FFT in Step 3 take into account 
that B — {V + 1) entries of a'^^ (and also, by construction, of a^-'^^^) are zero. As this inefficiency does 
not arise in the case B = V + 1 and has only marginal impact on interpolation complexity otherwise, we 
will not consider it further. Second, the FFT in Step 3 ignores the fact that a^^^' contains the (i? — 1)B 
zeros that were inserted in Step 2. Third, the values of a(s) at the base points, which are already known 
prior to interpolation, are unnecessarily computed by the FFT in Step 3 and then discarded in Step 4. 
In the following, we present a modified FFT algorithm, tailored to the problem at hand, which eliminates 
the latter two inefficiencies and leads to a significantly lower interpolation complexity than the generic 
FFT-based interpolation method described above. 

From now on, in order to simplify the notation, we assume that N — RB. Thus, with s„ = e-'2'^"/^, 
?i = 0, 1, . . . , iV — 1, the base points and the target points are given by bk = sj^k and t(fl-i)fe+r-i = SRk+r 
{k ~ 0,1, .... B — 1, r = 1,2,. 1), respectively. The derivation presented in the following will be 
illustrated through an example obtained by setting B = R = A and Vi ^ V2 + I = 2, but is valid in general 
for the case where Vi and V2 satisfy the inequalities < Vi < B /2 and < V2 < -B/2— 1, respectively. 
We note that these two inequalities, combined with B — 2ri°s(^i+^2+i)l ^ g^j.g satisfied in the case Vi — V2. 
Hence, the following derivation covers the case of interpolation of the entries of Q(s) ^ {MtL, MtL) 
and R,(s) - {MtL, MtL), as required in Algorithms II, III, II-MMSE and III-MMSE. 

The proposed modified FFT is based on a decimation-in-time radix-2 iV-point FFT, consisting of a 
scrambling stage followed by log computation stages each containing N/2 radix-2 butterflies described 
by the signal flow graph (SFG) in Fig. [1^. The twiddle factors used in the FFT butterflies are powers of 
CON = e-j2-/^. 

The SFG of the unmodified TV-point FFT is shown in Fig. [TJd. We observe that the scrambling stage at 
the beginning of the FFT (not depicted in Fig.[T}3) causes the nonzero entries a_Vi , a_yj+i, . . . , ay^ ofa*^^^) 
to be scattered rather than to appear in blocks as is the case in a*^^-^l The main idea of the proposed 
approach is to prune all SFG branches that involve multiplications and additions with operands equal to 
zero, as done in and all SFG branches that lead to the computation of the already known values of a(s) 

at the base points. The SFG of the resulting pruned FFT is shown in Fig. [2K- 

Further complexity reductions can be obtained as follows. We observe that in the pruned FFT, the 
SFG branches departing from oq, ai, . . . , ay^ contain no arithmetic operations in the first logi? computation 
stages. In contrast, the SFG branches departing from a-Vi,0'-Vi+i, ■ ■ ■ ,0,-1 contain multiplications by 
twiddle factors in each of the first log R computation stages. These multiplications can however be shifted 



®The SFG pruning approach proposed in ^5j] applies to the case Vi = only. 
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(a) (b) 



Figure 1: (a) SFG of a radix-2 butterfly (top) with twiddle factor u}^, and alternative, equivalent representation (bottom) 
needed for compact illustration in FFT SFGs. (b) SFG of the full A''-point radix-2 decimation-in-time FFT, without the 
scrambling stage. N = RB, B = R = 4, Vi = V2 + i- = 2. SFG branches depicted in grey will be pruned. 




Figure 2: SFG of the pruned W^-point FFT, without the scrambling stage, before (a) and after (b) shifting all multiplications 
from the first logiJ stages into stage 1 -|- logR. N = RB, B = R = 4, Vi = V2 + I = 2. 
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into computation stage 1 + logi? through basic SFG transformations. The result is the modified FFT 
illustrated in Fig. [2b, for which the first logi? computation stages do not contain any arithmetic operations 
and therefore have zero complexity, whereas the last logi? computation stages contain {R—l)B/2 butterfiies 
each. Thus, since each radix-2 butterfiy entails one full multiplication|f| the total complexity of FFT-based 
interpolation of a(s) from S to T is determined by the {B/2)logB full multiplications required by the 
B-point radix-2 IFFT a(^) = W^^ ag and the (-R — l)(-B/2) logi? full multiplications required in the last 
logB computation stages of the proposed modified i?i?-point FFT, which computes ar from a'^^^X The 
corresponding interpolation complexity per target point is therefore given by 

, {§\ogB) + {{R-l)§\ogB) 1 R 
CiP,FFT jj^^ 2 ]^ (59) 

We mention that a modified i?i3-point FFT can be derived, analogously to above, also in the case Vi = 
(for which V ^ V2 and B = 2r^°s(^2+i)"l ), relevant for interpolation of H(s) (0, L) in Algorithms I-III and 
I-MMSE through III-MMSE. The corresponding interpolation complexity per target point is again given 
by {591). 

Finally, we note that in MIMO-OFDM transceivers the FFT processor that performs TV-point IFFT/FFT 
for OFDM modulation/demodulation can be reused with slight modifications to carry out the i3-point 
IFFT and the proposed modified i?-B-point FFT that are needed for interpolation. Such a resource sharing 
approach reduces the silicon area associated with interpolation and hence further reduces cip^fft- The 
resulting savings will, for the sake of generality of exposition, not be taken into account in the following. 

8.4- Interpolation by FIR Filtering 

We consider upsampling of a{s) from B equidistant base points by a factor of R, as defined in Section [8?2l 
The derivations in this section are valid for arbitrary integers B,R > 1, and hence not specific to the case 
where B and R are powers of two. 

Proposition 11. In the context of upsampling from B equidistant base points by a factor of R, the B{R~ 
1) X B interpolation matrix TB^ satisfies the following properties: 

1. There exists an {R — I) x B matrix Fq such that TB^ can be written as 



i2 



FoC 



B 



(60) 



'We assume that the FFT processor does not use any dedicated multipliers. 
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with the B X B circulant matrix 



Is 1 

1 



2. The matrix Fq, as implicitly defined in i6(^) . satisfies 

[Fo]r,k+l ^ [Fo]*R-r,B-k, T = 1, 2, . . . , i? - 1, fc = 0, 1, 

Proof. Since B''' ~ {1/B)'B^, the entries of TB^ are given by 



,B-l. 



k' + l 



1 



n.(k-k')-\-r 



(61) 



v=-Vi 

for fc, A;' = 0, 1, . . . , S — 1 and r = 1, 2, . . . , i? — 1. The two properties are now established as follows: 



1. The RHS of l(6T|) remains unchanged upon replacing k and k' by (fc + l)mod_B and (fc' + l)niodi?, 
respectively. Hence, for a given r g {1, 2, . . . , i? — 1}, the B x B matrix obtained by stacking the rows 
indexed by r, (i? — 1) + r, . . . , (i? — 1)(-R — 1) + r (in this order) of TB^ is circulant. By taking Fq 
to consist of the last i? — 1 rows of TB^, and using = 1b, along with the fact that for 6 G Z, 
the multipHcation FqC^ corresponds to circularly shifting the columns of Fq to the right by bmodB 
positions, we obtain (|60|) . 

2. The entries of Fq are obtained by setting k = B — 1 in ^6T\\ and are given by 

r = l,2,...,i?-l. A:' = 0,1,.. 1. 



[Folr.fe' + l - ^ X] 



-R(k' + 1) 



Hence, for r = 1, 2, . . . , i? — 1 and fc' = 0, 1, . . . , i? — 1, we obtain 

V2 

B 



OlR-r,B-k 



1 

' = - y 



-o R-r-R(B-k') 



1 

B 



v=-Vi 

E ' 

v=-Vi 



-j2'KV - 



— [t: Ojr,fc' + l- 

□ 

We note that Property [T]in Proposition [11] implies that the matrix-vector multiplication (TB''')ag in |[8]) 
can be carried out through the application of i? — 1 FIR filters. Specifically, for r = 1, 2, . . . , i? — 1, the 
entries r,r + R, . . . ,r + [B — 1)R of ar can be obtained by computing the circular convolution of ag with 
the impulse response of length B contained in the rth row of Fq. In the remainder of the paper, we will 
say that the R — 1 FIR filters are defined by Fq. By allocating B dedicated multipliers per FIR filter (one 
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per impulse response tap), we would need a total of (i? — 1)B dedicated multipliers. We will next see that 
the complex-conjugate symmetry in the rows of Fo, formulated as Property [2] in Proposition [TTl allows to 
reduce the number of dedicated multipliers and the interpolation complexity by a factor of two. 

In the following, we assume that the multiplications of a variable complex- valued operand by a constant 
7 S C and by its complex conjugate 7* can be carried out using the same dedicated multiplier, and that 
the resulting complexity is comparable to the complexity of multiplication by 7 alone. This is justified 
as the multiplication by 7*, compared to the multiplication by 7, involves the same four underlying real- 
valued multiplications and only requires two additional sign flips, which have signiflcantly smaller complexity 
than the real-valued multipHcations. Thus, we can perform multipHcation by the coefficients [Fo],-,fe+i and 
[Fo]7?,-r.B-fe = [Fo]* k+i through a single dedicated multipHer (r = 1,2,..., i?/2, k = 0, 1, ... , B/2 — 1). 
This resource sharing approach leads to 



So far, we assumed that a{s) is interpolated from the B = 2r'°s(^+i)l base points in B, resulting in cjp 
according to (|62p . We will next show that the interpolation complexity can be further reduced by using a 
smaller number of base points B' < B. Interpolation will be exact as long as the condition B' >V + lis 
satisfied. 

As done above, we assume knowledge of the B samples a(s), s £ B. In the following, however, we require 
that for a given target point tr, the sample a{tr) is obtained by interpolation from only B' base points, 
picked from the B elements of ;B as a function oitr- For simplicity of exposition, we assume that B' is even, 
and for every tr G T we choose the B' elements of B that are located closest to tr onU. We will next show 
that the resulting interpolation of a(s) from B to T can be performed through FIR filtering. 

In the following, we define B disjoint subsets Tk oiT (satisfying Tq U 7i U . . . U Tb-i = T) and consider 
the corresponding subsets Bk of B, defined such that for all points in 7^, the B' closest base points are given 
by the elements oi Bk {k — 0,1, . . . , B — 1). We next show that the interpolation matrix corresponding to 
interpolation of a{s) from Bk to Tk is independent of k. To this end, we first consider the set of target points 
% ^ {t{B-i){R-i)+r-i ■ r ^ 1,2, . . . , R—1}, containing the R—1 target points located onU between the base 
points Bb-i and bo. The subset of S containing the B' points that are closest to every point in Tq is given by 
Bq — {bo, bi, . . . , fcs'/Z) ^B-B' /2: i'B-B' /2+1: ■ ■ ■ i Interpolation of a{s) from Bo to To involves the base 

point matrix Bq, the target point matrix Tq, and the interpolation matrix TqBJ, constructed as described in 
Section lTSl Next, for fc = 1, 2, . . . , S— 1, we denote by Bk and Tk the sets obtained by multiplying all elements 
of So and To, respectively, by e^^'^'^/^. We note that Tk contains the R—1 target points located on U between 
the base points bk-i and bk, and that Bk is the subset oiB containing the B' points that are closest to every 
point in Tk. With the unitary matrix Sfc = diag((e-''2^'=/^)^S (e^^Trfc/B^Vi-i^ (^^j2^k/ Byv^^^ interpolation 




(62) 
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of a(s) from Bk to 7^ involves the base point matrix = BpSfe, with pseudoinverse = ^bJ, the target 
point matrix = ToSfe, and the interpolation matrix TfeBj^ = ToS^S^^bJ = ToBJ (fc = 1, 2, . . . , B - 1). 
Hence, the interpolation matrix is independent of k and is the same as in the interpolation of a{s) from Bq 
to %. 

Now, interpolation of a(s) from B to T, with the constraint that the sample of a{s) at every target point 
is computed only from the samples of a(s) at the B' closest base points, amounts to performing interpolation 
of a{s) from Bk to 7^ for all fc = 0, 1, . . . , S — 1, and can be written in a single equation as ar = Fag. Here, 
the (i? — l)B X B interpolation matrix F is equal to the RHS of l(60|) . with the (i? — 1) x B matrix 

Fo= [(ToBS)i,s'/2 {ToBI)b-b'/2+ub] (63) 

which contains an all-zero submatrix of dimension {R — 1) x {B — B'). Hence, F satisfies Property [1] of 
Proposition[TTl with Fq given by l(63|) . In addition, we state without proof that Fq in (|63|) satisfies Property [2] 
of Proposition[TTJ We can therefore conclude that interpolation from the closest B' base points maintains the 
structural properties of interpolation from all B base points and, as above, can be performed by FIR filtering 
using R—1 filters with dedicated multipHers that exploit the conjugate symmetry in the rows of Fo- Since 
the rows of Fq in l(63|l contain B — B' zeros, the R—1 impulse responses now have length B' , and we obtain 



Cip 



(64) 



f B', Vi ^ V2. 



8.5. Inexact Interpolation 

The interpolation complexity l|64p of the approach described in Section 18.41 can be further reduced by 
choosing B' to be smaller than V + 1. This comes, however, at the cost of a systematic interpolation error 
and consequently leads to a trade-off between interpolation complexity and interpolation accuracy. In the 
context of MIMO-OFDM detectors, it is demonstrated in Section 19.11 that the performance degradation 
resulting from this systematic interpolation error is often negligible. In the following, we propose an ad-hoc 
method for inexact interpolation. The basic idea consists of introducing an interpolation error metric and 
formulating a corresponding optimization problem, which yields the matrix Fq that defines the FIR filters 
for inexact interpolation. 

For simplicity of exposition, we restrict our discussion to inexact interpolation of Q(s) ~ {MtL, MtL) 
and R(s) ~ [MtL, MtL) with Vi = V2 = MtL, as required in Step H] of Algorithm II. For random- valued 
MIMO channel taps Hq, Hi, ... , H^, we propose to quantify the interpolation error according to 



e(Fo) ^ E 



^ ||Q^(s„)H(s„)-R(s„)||^ 



(65) 
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where the expectation is taken over Hq, Hi, . . . , H^, and where the dependence of the RHS of ([65|) on Fq 
is imphcit through the fact that within Algorithm II, the computation of Q(s„) and R(s„) at the tones 
n e V^Lmt involves interpolation through the FIR filters defined by Fq. We mention that the metric e(Fo) 
in l|65p is relevant for MIMO-OFDM sphere decoding, and that minimization of e(Fo) does not necessar- 
ily lead to optimal detection performance. Other applications involving QR decomposition of polynomial 
matrices may require alternative error metrics. 

For upsampHng from B equidistant base points by a factor of i?, under the condition Vi — V2, the 
matrix Fq in (|63l) is a function of N, R, B, B' , and Vi. Now, we have that is a fixed system parameter 
and B = 2^^°s('^^^tL+i)] _ Moreover, R is determined by N, B, and V, since R is either given by i? = N/B 
in the case = or is a function of B and V in the case \'D\ < N. Finally, under a fixed complexity 
budget (i.e., a given value for cjp), B' is constrained by l|64p . Now, Q(s),R(s) ~ (MtL, MtL) determines 
Vi = MtL, but we propose, instead, to consider Vi as a variable parameter, so that Fq = Fo(Vi). The 
interpolation error is then minimized by first determining 

argmin e{Fo{Vi)) 
Vie{i,2, ---.MtL} 

numerically, and then performing interpolation through the FIR filters defined by Fo{V{). 



9. Numerical Results 

The results presented so far do not depend on a specific QR decomposition method. For the numerical 
complexity comparisons presented in this section, we will get more specific and assume UT-based QR decom- 
josition performed through Givens rotations and coordinate rotation digital computer (CORDIC) operations 
which is the method of choice in VLSI implementations . For A G ([^P>^M with P > M , it was 

shown in 3| that the complexity of UT-based QR decomposition of A according to the standard form ^ , 
as required in Algorithms I-III, is given by 

Cqr*' = \{P''M + PM^) - m3 - _ p + m2 + M) 

and that the complexity of efficient UT-based regularized QR decomposition of A according to the standard 
form (ini, as required in Algorithms I-MMSE and II-MMSE, is giveij^ by 

«E-QR - liP'M + PM') - \P^ + \P (66) 

The results in P| carry over, in a straightforward fashion, to UT-based QR decomposition of the augmented 
matrix [A^ ciLmV" according to the standard form l(49|) . as required in Algorithm III-MMSE, to yield 

PxM A PxAf , 3 2 , 1 

^QR,III-MMSE ~ ''MMSE-QR ^ 2 2 



'^In [31, the last term on the RHS of Il66[l was erroneously specified as — (1/2)P. 
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9.1. Efficient Interpolation and Performance Degradation 

We start by quantifying the trade-off between interpolation complexity and detection performance, de- 
scribed in Section [8751 Specifically, we evaluate the loss in detection performance as we gradually reduce B' , 
and hence also cip, in the interpolation of Q(s) and R(s), as required by Algorithm II. The corresponding 
analysis for the interpolation of qfc(s) and f k = 1,2,..., Mt, as required by Algorithm III, is more 
involved and does not yield any additional insight into the trade-off under consideration. The numerical 
results presented in the following demonstrate that for Algorithm II to have smaller complexity than Algo- 
rithm I, setting B' to a value smaller than V + 1, and hence accepting a systematic interpolation error, may 
be necessary. On the other hand, we will also see that the resulting performance degradation, in terms of 
both coded and uncoded bit error rate (BER) , can be negligible even for values of B' that are significantly 
smaller than + 1 . 

In the following, we consider a MIMO-OFDM system with D = N = 512, Mr = 4, and either Mt = 2 
or Mt — 4, operating over a frequency-selective channel with L — 15. The data symbols are drawn from 
a 16-QAM constellation. In the coded case, a rate 1/2 convolutional code with constraint length 7 and 
generator polynomials [133o 171o] is used. The receiver performs maximum-likelihood detection through 
hard-output sphere decoding. Our results are obtained through Monte Carlo simulation, where averaging 
is performed over the channel impulse response taps Hq, Hi, . . . , assumed i.i.d. CN{0, 1/{L + 1)). This 
assumption on the channel statistics, along with the average transmit power being given by E[c^c„] = 1 
and the noise variance trj, implies that the per-antenna receive signal-to-noise ratio (SNR) is 1/crJ. The 
receiver employs either Algorithm I or Algorithm II to compute Q(s„) and R(s„) at all tones. We assume 
that in Step [1] of both algorithms, H(s) ~ (0,i) is interpolated exactly from B = L + 1 = 16 equidistant 
base points by FIR filtering. Since = Vi ^ V2 = L, the corresponding interpolation complexity per target 
point is obtained from l(62|) as cip^h — (L + l)xc/2. With xc — 1/4, as assumed in Section \8A\ we gejf] 
cip.H = 2. In Step [4] of Algorithm II, we interpolate Q(s) {MtL,MtL) and R(s) - {MtL,MtL), 
with maximum degree V = 2MtL, through FIR filtering from B' < B = 2r'°s(^+i)l base points. With 
Vi ^ V2 = MtL, the corresponding interpolation complexity per target point is obtained from (IMl) as 
''IPQR ~ X¥iB'/2 with xr — 1/8, as assumed in Section [87T1 We ensure that systematic interpolation errors 
are the sole source of detection performance degradation by performing all computations in double-precision 
fioating-point arithmetic. Under inexact interpolation, for every value of B' < V + 1 we determine the 
value of V{ that minimizes the interpolation error e(Fo) in (l65)l according to the procedure described in 
Section 18.51 

^Performing interpolation of H(s) by FFT would lead to cip_h according to II59II . which with B = 16 and R = N/B = 32 
results in cip h = 64/31 ~ 2.06. Hence, in this case interpolation of H(s) by FIR filtering and by FFT have comparable 
complexity. 
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Table 2: Simulation parameters 



Mt 




vi 


''lP,qR 


C„/Ci 


Interpolation 


method 


2 


64 


30 


3.43 


0.74 


FFT, exact 


2 


64 


30 


4 


0.82 


FIR filtering 


;, exact 


2 


32 


27 


2 


0.55 


FIR filtering, 


inexact 


2 


16 


25 


1 


0.41 


FIR filtering, 


inexact 


2 


12 


23 


0.75 


0.37 


FIR filtering. 


inexact 


2 


8 


21 


0.5 


0.34 


FIR filtering. 


inexact 


4 


128 


60 


4.67 


1.08 


FFT, exact 


4 


128 


60 


8 


1.54 


FIR filtering 


;, exact 


4 


32 


50 


2 


0.71 


FIR filtering. 


inexact 


4 


24 


48 


1.5 


0.64 


FIR filtering. 


inexact 


4 


16 


42 


1 


0.57 


FIR filtering. 


inexact 


4 


8 


31 


0.5 


0.50 


FIR filtering. 


inexact 



Common to all simulations are the parameters D — N — 512, L = 15, Mr = 4, and cip_h ~ 2. 

Table [H summarizes the simulation parameters, along with the corresponding values of the interpolation 
complexity per target point Cjpqp^ and the resulting algorithm complexity ratio Cn/Ci, which quantifies 
the savings of Algorithm II over Algorithm I. The values of C\\/C\ for the case where Q(s) and R(s) are 
interpolated exactly by FFT are provided for reference. We note that for Mt — 4, exact interpolation, 
both FFT-based and through FIR filtering, results in Cu > Cj. Hence, in this case inexact interpolation 
is necessary to obtain complexity savings of Algorithm II over Algorithm I. In contrast, for Mt — 2, 
Algorithm II exhibits lower complexity than Algorithm I even in the case of exact interpolation. 

Figs. [3^1 and[3jD show the resulting BER performance for Mt = 2 and Mt — 4, respectively, both for the 
coded and the uncoded case. For uncoded transmission and inexact interpolation, we observe an error fioor 
at high SNR which rises with decreasing B' . For Mt — 2 and uncoded transmission, we can see in Fig. [3^ 
and Table [21 respectively, that an interpolation filter length of B' = 8 results in negligible performance loss 
for SNR values of up to 18 dB, and yields complexity savings of Algorithm II over Algorithm I of 66%. 
Choosing _B' = 16 yields close-to-optimum performance for SNR values of up to 24 dB and complexity 
savings of 59%. For Mt = 4 and uncoded transmission. Fig. [3h and Table [2] show that the interpolation 
filter length can be shortened from B' = 128 to B' = 8, leading to complexity savings of Algorithm II over 
Algorithm I of 50%, at virtually no performance loss in the SNR range of up to 21 dB. Setting B' = 32 
results in a performance loss, compared to exact interpolation, of less than 1 dB at BER ~ 10^^ and in 
complexity savings of 29%. In the coded case, both for AIt = 2 and AIt = 4, we can see in Figs.[3K and[3j3 
that the BER curves for Algorithm II, for all values of B' under consideration, essentially overlap with the 
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SNR [tlB] 



SNR [dB] 



(a) (b) 

Figure 3: Bit error rates as a function of SNR for different interpolation filter lengths, with and without channel coding, for 
(a) Mt = 2 and (b) Mt = 4. The results corresponding to exact QR decomposition are provided for reference. 

corresponding curves for Algorithm I for BERs down to 10~^. This observation suggests that for a given 
target BER and a given tolerated performance loss of Algorithm II over Algorithm I, the use of channel 
coding allows to employ significantly shorter interpolation filters (corresponding to a smaller Cjpgp^ and 
hence to a lower Cn, which in turn implies higher savings of Algorithm II over Algorithm I) than in the 
uncoded case. We conclude that in the practically relevant case of coded transmission, complexity savings 
of Algorithm II over Algorithm I can be obtained at negligible detection performance loss. 

9.2. Algorithm Complexity Comparisons 

The discussion in Section [8] and the numerical results in Section 19.11 demonstrated that for the case of 
upsampling from equidistant base points, small values of cjp can be achieved and inexact interpolation does 
not necessarily induce a significant detection performance loss. Therefore, in the following we assume that 
for all A: = 1,2, . . . ,Mt, the set Ik is such that 5(1^) contains Bk = |Ifc| = 2ri°S2(2fci+i)l base points that 
are equidistant on U, and assume that cip = 2. The latter assumption is in line with the values of cip^h 
and Cjp found in Section 19.11 

For D = 500, L — 15, and different values of Mt and Mn, Fig. [4^ shows the complexity of Algorithms II 
and III as percentage of the complexity of Algorithm I. We observe savings of Algorithms II and III over 
Algorithm I as high as 48% and 62%, respectively. Furthermore, we can see that Algorithm III exhibits 
a lower complexity than Algorithm II in all considered configurations. We note that the latter behavior 
is a consequence of the small value of cjp and of Algorithm III, with respect to Algorithm II, trading a 
lower QR decomposition cost against a higher interpolation cost. Moreover, we observe that the savings 
of Algorithms II and III over Algorithm I are more pronounced for larger Mr — Mt- For the special case 

42 




£ = V, where interpolation of H(s) is not necessary and Algorithm I simplifies to the computation of D 
QR decompositions, Fig. |4|d shows that the relative savings of Algorithms II and III over Algorithm I are 
somewhat reduced, but still significant. We can therefore conclude that interpolation-based QR decom- 
position, provided that the complexity of interpolation is sufficiently small, yields fundamental complexity 
savings. 

For D = 500, Mr = Mr, and different values of L, Fig. [5^1 shows the complexity of Algorithms II-MMSE 
and III-MMSE as percentage of the complexity of Algorithm I-MMSE. The fact (which also carries over 
to the savings of Algorithms II and III over Algorithm I) that the savings of Algorithms II-MMSE and 
III-MMSE over Algorithm I-MMSE are more pronounced for smaller values of L is a consequence of Bk 
being an increasing function of L. In Fig. [5^, we can see that despite the low interpolation complexity 
implied by cjp = 2, Algorithm III-MMSE may exhibit a higher complexity than Algorithm II-MMSE. This 
is a consequence of the fact that for some values of Mt, Mr, and L, the overall complexity of the UT- 
based QR decompositions with standard form (|49|) required in Algorithm III-MMSE is larger than the 
overall complexity of the efficient UT-based regularized MMSE-QR decompositions with standard form l|5ip 
required in Algorithm II-MMSE. 

Finally, Fig. [SJd shows the absolute complexity of Algorithms I-III and I-MMSE through III-MMSE as a 
function of D, for Mt = 3, Mr — 4, and L — 15. We observe that the complexity savings of Algorithms II 
and III over Algorithm I and the savings of Algorithms II-MMSE and III-MMSE over Algorithm I-MMSE 
grow linearly in D. This behavior was predicted for Algorithms I and II by the analysis in Section [631 where 
we showed that Ci — Cu is an affine function of D and is positive for small cip and large D. 
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10. Conclusions and Outlook 

On the basis of a new result on the QR decomposition of LP matrices, we formulated interpolation-based 
algorithms for computationally efficient QR decomposition of polynomial matrices that are oversampled on 
the unit circle. These algorithms are of practical relevance as they allow for an (often drastic) reduction of the 
receiver complexity in MIMO-OFDM systems. Using a complexity metric relevant for VLSI implementations, 
we demonstrated significant and fundamental complexity savings of the proposed new class of algorithms 
over brute-force per-tone QR decomposition. The savings are more pronounced for larger numbers of 
data-carrying tones and smaller channel orders. We furthermore provided strategies for low-complexity 
interpolation exploiting the specific structure of the problem at hand. 

The fact that the maximum degree of the LP matrices Q(s) and R(s) is 2MtL, although the polynomial 
MIMO transfer function matrix H(s) has maximum degree L, gives rise to the following open questions: 

• Is the mapping A4 optimal in the sense of delivering LP matrices with the lowest maximum degree? 

• Would interpolation-based algorithms for QR decomposition that explicitly make use of the unitarity 
of Q(s) allow to further reduce the number of base points required and hence lead to further complexity 
savings? 

Additional challenges include the extension of the ideas presented in this paper to sparse channel impulse 
responses, for which only few of the impulse response tap matrices are nonzero. 
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